General Disclaimer 


One or more of the Following Statements may affect this Document 


• This document has been reproduced from the best copy furnished by the 
organizational source. It is being released in the interest of making available as 
much information as possible. 


• This document may contain data, which exceeds the sheet parameters. It was 
furnished in this condition by the organizational source and is the best copy 
available. 


• This document may contain tone-on-tone or color graphs, charts and/or pictures, 
which have been reproduced in black and white. 


• This document is paginated as submitted by the original source. 


• Portions of this document are not fully legible due to the historical nature of some 
of the material. However, it is the best reproduction available from the original 
submission. 


Produced by the NASA Center for Aerospace Information (CASI) 




INCORPORATED 






S76-2763a 

HC 


PBEIIHINIEY STDDT Of THE 
*NC SAHEIING STBiTEGIIS 
IMEGBITION OF BEBOTE 
: imC THE CUBBEKT 
•u»r&STING (BOON, Inc., 


(NASA-CB-1 «<8154) A 
STATISTICAL ANALYSE 

associated with the 

SENSING CAEABIIITIB 
AGBICDLTOBAI CBCE F 


Ooclas 

15158 


A PRELIMINARY STUDY OF THE STATISTICAL 
ANALYSES AND SAMPLING STRATEGIES 
ASSOCIATED WITH THE INTEGRATION OF 
REMOTE SENSING CAPABILITIES INTO 
THE CURRENT AGRICULTURAL CROP 
FORECASTING SYSTEM 



: y 

vft; 

1 ; ■ :)M 'f 




*■ 




« ^ 

! i' 






- 

‘i ■ 



% iilFi 







ers 




t 


INCORPOR AT t D 


Report #75-127-1 

NiNI lllfNm’in STAT! KOAU 
PRINC I I t )N, Ml VV )I:RS1;Y 0.S5>1() 
609 92 1 


A PRELIMINARY STUDY OF THE STATISTICAL 
ANALYSES AND SAMPLING STRATEGIES 
ASSOCIATED WITH THE INTEGRATION OF 
REMOTE SENSING CAPABILITIES INTO 
THE CURRENT AGRICULTURAL CROP 
FORECASTING SYSTEM 


Prepared for the 
Office of Applications 

National Aeronautics and Space Administration 
Gontract No. NASW-2558 


June 30, 1975 


JUL197B 

RECEIVED 

iSA STI FACIU 
INPUT BRANCH 


/Xy 

fe 

M NASA STI FACIUTlf ^ 


fCONOMICS OIMRA'IIONS RlSiARCII SVSIIMS ANM YSIS R( )l It Y Si UDll ^ T1 Cl iNUI OOY ASSI SSMI N I 



ABSTimCT 


Remote sensing of agricultural croplands has been 
experimentally applied to the estimation of regional crop 
production statistics. Increasing accuracy and timeliness of 
the crop acreage component by remote sensing appears to be a 
major source of benefits from the new technology . Extending 
the crop survey application from small experimental regions to 
state and national levels requires that a sample of agricul- 
tural fields be chosen for remote sensing of crop acreage, and 
that a statistical estimate be formulated with measurable char- 
acteristics. The critical requirements for the success of the 
application are reviewed in this report. The problem of 
sampling in the presence of cloud cover is discussed. Integra- 
tion of remotely sensed information about crops into current 
agricultural crop forecasting systems is treated on the basis 
of the USDA multiple frame survey concepts, with an assumed 
addition of a new frame derived from remote sensing. Evolution 
of a crop forecasting system which utilizes LANDSAT and future 
remote sensing systems is projected for the 1975-1990 time 
frame in this preliminary study. 


ii 



NOTE OF TRANSMITTAL 


This report on a preliminary study of statistical 
integration of remotely sensed crop data into existing crop 
survey systems is prepared for the Office of Applications, 
National Aeronautics and Space Administration under Contract 
NASW-2558. It is based on a review of the current state of 
the art in remote sensing applications- in agriculture and 
current crop survey methods . This study is entirely indepen- 
dent of the case studies in crop survey applications which are 
reported in other voliomes under NASW-2558. The conclusions of 
the authors are their own, and do not necessarily reflect the 
detailed results of the economic models reported separately 


in the other volumes. 




Project Director 



TABLE OF CONTENTS 


Page 


IVbstract 


Note of 

Transmittal 

iii 

Table of 

Contents 

iv 

List of 

Figures and Tables 

vi 

1. 

Issues 

1-1 


1.1 

Introduction 

1- 1 


1.2 

Sampling Strategy 

1- 2 


1.3 

Evolutionary Approaches 

1- 6 
1- 8 


1.4 

Economic Issues 

2. 

The 

Interface Between LANDSAT and USDA/SRS 

2-1 


2.1 

The Interface in 1975-1980 

2- 1 



2.1.1 Acreage Interface 

2- 3 



2.1.2 Yield Integration 

2- 8 


2.2 

The Interface in 1980-1990 

2-10 

3. 

Critical Requirements of the Integration 

3- 1 


3.1 

Sample Design 

3- 1 


3.2 

Missing Data Due to Cloud Cover 

3- 4 


3.3 

Comparability of Satellite and Ground 
Survey Data 

3- 8 



3.3.1 Different Sampling Frames 

3.3.2 Different Timing of Acquisition of 

3- 8 
3-11 



Survey Data 


3,4 

Uses of Ancillary Data in LANDSAT Applica- 
tions to Crop Surveys 

3-12 

4. 

Conclusions and Recommendations 

4- 1 


4.1 

Conclusions 

4- 1 


4.2 

Specific Recommendations on Integration 
of Data 

4- 3 


IV 



Appendix A 
Appendix B 

Appendix C 

Appendix D 


TABLE OF CONTENTS (Continued) 


4.2.1 Techniques 

4.2.2 Evolutionary Approach to- 
Integration 

"Measurement of the Yield Component" 

Selected Quotes from "Scope and Methods 
of the Statistical Reporting Servicer " 
USDA Miscellaneous Publication No, 1308 

"The Remote Sensing of Bare Fields For 
Crop Acreage Estimation" - ECON 
Memorandum (1975) 

"Sampling Problems in Remote Sensing Crop 
Survey Applications" - ECON Working 
Papers (1975) 


Page 

4- 3 
4- 4 

A- 1 
B- 1 

C- 1 
D- 1 


V 



LIST OF FIGURES AND TABLES 




Page 

3.1 

Comparison of Remote Sensing Techniques 
For Crop Classification 

3- 3 

3.1 

Maps of the U.S. Showing Frequency of 
Cloudiness 

3- 7 

4.1 

Logical Relationships Between Evolutionary 
Stages 

4- 6 

A.l 

Wheat Yield Factors 

A-17 

B.l 

Regression Chart for Estimation of the 
States VJinter Wheat Yield 

B- 6 

B.2 

Time Series Chart for Estimation of the 
States Stocks of Wheat 

B- 6 

D.l 

Cloud Cover Statistics by Weather Region by 
Month 

D- 4 

D.2 

Numbered Segments in a Wheat "Belt" 

D- 7 



1. ISSUES 


!• 1 Introduction 

In repeated experiments/ investigators have success- 
fully applied LANDSAT multi-spectral digital data to the clas- 
sification of crop acreage in various narrowly defined agri- 
cultural land areas. The idea that this application of LAND- 
SAT holds promise for inventorying crop production on a large 
scale gained increasing support over the past several years. 

At present the Large Area Crop Inventory Experiment (LACIE) is 
testing this idea on a continental scale with the goal of a 90-s 
accurate crop production estimate at the 90% confidence level 
for selected major crops. In order to pass from experimental 
verification to an operational crop survey system incorporating 
the use of LANDSAT multi-spectral scanner (MSS) digital data, it 
is essential to plan the linkage of these data with other crop 
data currently available from the US Department of Agriculture 
(USDA) crop surveys, and with meteorological data for processing 
yield estimates. The purpose of this report is to examine the 
requirements for integrating LANDSAT data into USDA crop surveys 
to further the aim of achieving an improved crop survey system. 

Reviewing the investigations completed to date, we 
find that only the acreage measurement component of crop pro- 
duction estimates has been adequately developed in LANDSAT ex- 


1-1 



periments to date, to permit system design consideration for 
the integration of satellite and ground data to full-scale crop 
inventories. Accordingly, the main part of this discussion is 
limited to crop acreage measurements. 

An open question is; Can LANDSAT data be used inde - 
pendently of USDA crop survey data to prepare national and 
state-level crop acreage estimates with acceptable accuracy? 
While this question is an issue of many ERTS investigations and 
experiments, the development of an improved USDA crop survey 
based upon satellite data integrated into a crop survey system 
would, in any case, be a necessary step in a well planned de- 
velopment program of the LANDSAT crop survey application. Thus, 
this report addresses the task of integrating LANDSAT data into 
the existing USDA crop surveys. Other tasks relating LANDSAT 
agricultural applications to more distant goals, including in- 
dependent yield estimation from satellite data and/or global 
crop surveys, are also discussed briefly. However, it is not 
possible here to do more than indicate feasible scenarios for 
this later stage of developing a comprehensive v/orldwide satel- 
lite capability in agricultural surveys. 

1.2 Sampling Strategy 

While LANDSAT observations might ultimately cover 
most of the surface or the earth as the satellite sweeps through 
its 18-day cycle, the use of a complete census of agricultural 


1-2 



areas in crop inventories would be unnecessarily expensive. 

The objectives of a crop survey are: 

• to provide timely and accurate data on crop plant- 
ing, growth and harvesting 

• to permit statistical estimates of crop production 
to be made within acceptable confidence limits. 

Satisfying these objectives subsequently yields economic 
benefits through the publication of crop reports containing 
crop data and production estimates (or forecasts) . It 
follows that a cost-effective approach to the LANDSAT crop 
survey application requires sampling the total crop area - 
or equivalently selecting sample segments from the agri- 
cultural land area observed by LANDSAT - for subsequent ' 
processing into crop acreage and yield information. 

USDA crop surveys use two basic kinds of samples 
in the current Statistical Reporting Service (SRS) procedures 
for collecting crop data. One, a probability sample, is 
based on area segments selected from aerial photographs of 
the farmlands. Complete and objective crop data are obtained 
within the sampled segments by enumerators. The other kind, 
a non-probability sample, is obtained by mailing questionnaires 
to farmers on a carefully compiled list at certain fixed 
times of year. Those farmers on the list who do respond, 
supply much detailed and valuable crop and livestock 
information - which, however, cannot be checked, and thus is 


1-3 



more or less subjective. The total sampling is believed- to 
represent 0.6% of the farmlands (by area) in the United States. 
Thus sampling error— the statistical variation between differ- 
ent samples - is a major contribution to the total error in the 
final estimates of crop production. 

LANDSAT coverage of croplands is so extensive that 
the sampled area could, in principle, be extended to almost any 
desired fraction of the total area. In practice however, there 
are important considerations which limit this area fraction to 
some figure less than 100%, although substantially larger than 
0 . 6 %: 

4 ...... 

<a The presence of cloud cover reduces the sample 
size obtained in any particular timespan by 
LANDSAT. 

o The processing costs per LANDSAT frame are 
likely to be high, at least for early 
systems, so that the total acreage sampled 
' must be kept to a modest level. 

© The recommended approach toward development of the 
LANDSAT crop survey capability is evolutionary. 
Adjustments and refinements are easier to perform 
on smaller scale systems. 

The design of the sample must be prepared by statis- 
ticians for efficient estimation of crop acreage within target 
ed confidence limits. In cases of mixed agricultural areas. 


1-4 



inulticrop sample design is preferrable for reasons of efficiency 
The sample should be stratified, with strata chosen to repre- 
sent known intensities of agricultural activity and convenient 
political boundaries as with the USDA crop reporting districts 
(CRD's). Provision must be made for the rejection of sample 
segments after data acquisition, either because the cloud cover 
obscures essential data such as training sites, or because 
there are system-caused data losses in the segments. Then re- 
sampling these segments, or adjustment of the weights used for 
the surviving sample segments in the estimation formula will 
be required. 

Statistical estimation of crop acreage from the sample 
requires "expansion" of the crop acreage measured in the sample 
segments containing the crop to the regional, state or national 
reporting level. Inferences made along scientific lines carry 
a known confidence, and thus are useful- for resource managers 
seeking information about the crop. The final statistical es- 
timate of crop acreage should supply reliable, accurate informa- 
tion to be integrated with other crop survey data at the appro- 
priate level for the publication of crop production reports. 

Sources of statistical variability in MSS data 
obtained by LANDSAT* include the time of year, the degree of 
cloudiness, the crop planting schedule, and the sample design. 

^Assuming continuation of the present sun-synchro- 
nous orbit, sun angle is not a significant source of varia- 
bility at a particular time of year. 


1-5 



In development of the crop classification and acreage mensura- 
tion techniques, there are many statistical inference problems 
to be solved. These problems are conceptually distinct from 
the subject of this section which concerns the design of an 
area sample for selected crops and the estimation of regional, 
state or U.S. crop acreage from the sample. Nevertheless, due 
to the complex nature of the data analysis, it may be found 
convehient to combine statistical inference problems at all 
levels from the pixel to the final large area estimate. This 
approach is not in any way precluded by the discussions of this 
section. There is, on the other hand, no necessity to attempt 

the linkage at this time. 

1.3 Evolutionary Approaches 

One approach to the development of a new technology 
application such as remote sensing of agricultural crops is to 
implement parallel systems. (of crop forecasting) with the 
intention of phasing out the less efficient system as soon as 
possible. Another approach, which we recommend here, is to 
use the new technology in conjunction with the existing system, 
(affecting a gradual integration of new and old data collection 
and analytic techniques. 

There is, at present, insufficient experience in ap- 
plying LANDSAT data to crop surveys for an integrated satellite- 
aircraft-ground truth crop inventory system to be fully and ac- 
curately specified. Yet the positive evidence accumulated to 


1-6 



date allows for a reasonable expectation that the LACIE and 
principal investigator results will lead to a first-generation 
operational crop acreage estimation system, at least for some 
crops and some geographic regions. The successful integration 
of the LANDSAT crop information into existing USDA/SRS pro- 
cedures, requires that the system development should proceed 
in an evolutionary manner in spite of some apparently revolu- 
tionary aspects of the LANDSAT capability in agriculture. 

The use of LANDSAT data to estimate leaf area index (LAI) 
or other yield correlatives may become significant one day, 
and thus be acceptable as a useful addition to the existing 
USDA yield measurement programs. But so long as the degree 
of correlation is still very weak, it is necessary to continue 
using the existing methods without modification, while at the 
same time implementing LANDSAT- based changes in the acreage 
measurement programs. 

In order to achieve a fundamental change in agricul- 
tural crop reporting accuracy and comprehensiveness, it will un 
doubtedly be important to achieve a meaningful articulation be- 
tween LANDSAT measurements of crop acreage and USDA/SRS data 
handling. This imposes , at the very least, a requirement for 
a LANDSAT acreage reporting format which can be directly uti- 
lized by SRS together with its other multiple- frame area 
surveys. Timing of reports will also have to be considered. 


1-7 


The evolutionary approach to the subject provides for incremen- 
tal steps to be taken which supply new crop survey information 
■to SRS only after thorough testing and demonstration of the reli- 
ability of that information, and after agreement has been 
.reached with SRS regarding the format of the information. 

1. 4 Economic Issues 

It has been determined by detailed economic analysis 
that substantial benefits could be obtained by U.S. food consum— 
gj-g gg g result of improvements in crop production forecast 
accuracy. The specific magnitude of the benefits has been ob- 
tained in a concurrent ECON study* as a function of three para- 
meters of the total information system: the planted acreage 

estimation accuracy, the frequency of measurement of planted 
acreage and the data lag between the time of the measurement 
and the issuance of a production (forecast) report. Clearly, 
the values that these parameters take on are a function of 
both the satellite system and the ground processing system. 
(Measurement of the yield component of production is assumed 
to continue at the current degree of accuracy.) The magnitude 
of the disbenefits associated with errors in current USDA 
crop production forecasts are estimated to be $211 million for 
wheat and $40 million for soybeans annually.- Hence, even small 

~ *The Value of Domestic Production Information in Con- 

sumption Rate Determination for Wheat, Soybeans, and Small Grains, 
ECON , Inc . , Report No. 75-127-3, Princeton, N:J., August 31, 197 j>. 

1-8 



improvements in these forecasts could provide sufficient bene- 
fits to justify development of the new space-based capabilities 
required. 

The proposed application of LANDSAT agricultural crop 
surveys requires implementation on a national (for U.S. Crops) 
and, perhaps, worldwide scale. The benefits estimated do not 
accrue if the results of the application are not integrated into 
a crop production reporting and disseminated on a non-discrim- 
inatory basis. Inasmuch as the application of LANDSAT data 
provides reliable acreage estimates only {at least initially - 
good yield estimates may follow later) there is no economic 
basis at present for considering distr-ibution of the LANDSAT 
data other than through a statistical reporting service which 
has the necessarY capability to integrate LANDSAT data with the 
other elements of crop production estimates in order to obtain 
improved production estimates and forecasts. 

In addition to the benefits associated with improved 
crop production estimates and forecasts obtained on a national 
scale, additional benefits would result from the dissemination 
of local and regional statistics, for example, at the state or 
county levels. Provisions for this secondary distribution can 
also be made through the statistical reporting service respon- 
sible for the national statistics as much of the necessary 
machinery already exists for cooperation between the various 
concerned agricultural. agencies. The development of marketable 


1-9 



information products from the crop survey application can rea- 
sonably be anticipated at present and is, of course, an import- 
ant economic, issue; however, until the development of practical 
LANDSAT data processing techniques is further along and such 
products forthcoming, we envision that the new information would 
be used mainly by governmental agencies responsible for assess- 
ing crop production, quantities and crop conditions. To wait 
for the growth of private enterprises to process LANDSAT data 
into marketable products might entail considerable loss of time, 
during which benefits from a LANDSAT capability in agriculture 
could have been realized through the public sector. On the 
other hand, the growth of a market for specialized information 
products and services derived from satellite images of agricul- 
tural crop and rangeland may be expected to occur concurrently 

with the improvement of the national and state crop production 
estimates and this market will undoubtedly be served partly 
or v7holly by private enterprises under the existing system for 
the distribution and pricing of LANDSAT data. 

The above issues notwithstanding, the major economic 
issue concerning the implementation of LANDSAT data into crop 
production estimates and forecasts deals with the present un- 
certainty in the technical capability that a LANDSAT type 
satellite-based system might offer and what computational (auto- 
matic) and manual treatment of the data are necessary to achieve 
this capability. To be sure, the implementation of LANDSAT data 


1-10 



into a crop production estimation and forecasting system is still 
very much a topic of, research, despite the fact that some rather 
definitive statements might presently be made regarding the in- 
terim, if not the ultimate, system capability. Thus, the problem 
of an implementation schedule becomes quite important. Should 
a system of lower capability be implemented early as opposed to 
a system of higher capability delayed in time? To what extent 
should the system rely on manual versus automatic processing vis 
a vis area coverage , data lag and flexibility for system growth? 
Should the initial capability be optimized, for example, to pro- 
duce the maximum net economic benefit or should the system be 
designed merely to meet certain institutional goals while allow- 
ing for added freedom of growth? A substantial policy analysis 
should be addressed to the potential implementation scenarios. 
This analysis should include a detailed cost and capability 
analysis of the alternatives and an analysis of the risks assoc- 
iated with each alternative. The remainder of this report sets 
the stage for such a study. 


1-11 



2. THE INTERFACE BETWEEN LANDSAT AND OSDA/SRS 

2,1 Th e Interface in 1975 - 1980 

From the multiplicity of uses of LANDSAT imagery 
reported in the scientific literature, there would appear to be 
a bewildering array of choices for the organization of the crop 
survey applications. However, there are pertinent facts con- 
cerning the economics of the applications which narrow the 
field of choice . In order to develop the applications in eco- 
nomically viable ways, there are several prereguisites that must 
be satisfied. We consider the primary requirements to be: 

o The survey should measure economically important 
aspects of agriculture, such as crop production 
for a major crop at state or national levels, 
e The results of the survey should be available to 
all interested users in the agricultural community 
in a timely fashion. 

9 The format of the information developed from LAND- 
SAT data should be acceptable to the users, which 
implies that the presentation should be relatxvely 
effortless to interpret. 

• The processing of LANDSAT data should be done ef- 
ficiently to avoid excessive costs or delays . 
o The statistical nature of crop survey information 
requires a scientific application of statistical 
techniques to ensure accuracy and high confidence 
in the information. 

2-1 



Within the guidelines of these constraining con- 
siderations, the major choices for the crop survey applications 
appear to be encompassed within the following questions; 

1. Which crops to survey? 

2. Which geographic areas to cover and how complete 
does each coverage need to be? 

3. What are the crop measurements (statistics ) to 
derive from LANDSAT data? 

4. Who are the end users of the processed results? 

5. To what extent should the applications be locked 
in to existing institutional procedures for pub- 
lishing agricultural information? 

For the purposes of this preliminary study of the integration of 
LANDSAT applications with USDA/SRS procedures, the scope of our 
inquiry is further narrowed to an examination of the data handling 
and statistical problems under the following assumptions: 

e The responsible user agency will receive computer 

compatible tapes of geometrically and radiometrically 
corrected LANDSAT data, or that agency will have in- 
house capability to perform these preprocessing 
corrections. 

9 The agricultural community will receive Improved crop 
production estimates and forecasts as a result of the 
use of LANDSAT data in the crop surveys. 

♦ The use of LANDSAT data for the crop surveys will be 


2-2 



efficiently organized, so as to avoid unnecessary errors 
of interpretation, delays and costs in. processing the 
data, and to facilitate the achievement of the desired 
goals in the improvement of crop production forecasting. 
The interface can now be characterized througn 
an analysis of USDA/SRS crop production estimation procedures, 
together with a review of results of principal investigations 
using LANDSAT data to classify and measure crop statistics. 

The crop production estimate is a product of two components: 
acreage harvested and yield per acre. These are sampled, mea- 
sured and estimated in separate programs by SRS. 

Most likely, early systems will be built to obtain 
improvements in crop acreage estimates until such time as crop 
yield estimation can be significantly enhanced through remote 
sensing of the crop. We will review the interface issue under 
acreage and yield headings separately. 

2.1.1 Acreage Interface 

The Agriculture Handbook No. 365* published by USDA 

refers to the acreage estimates in the following terms. 

”In general, the progression of acreage estimates 
is from prospective plantings to acreage intended 
for harvest to acreage actually harvested. Most 
spring-sown field crops follow this sequence: (1) 


“ * "Major Statistical Series of the U.S. Department of 

Agriculture,” Vol. 8, May 1971, 

2-3 



' acreage intended for planting as of March 1 ^ re~ 
leased about mid-March; (2) acreage planted and 
acreage for harvest, released with the midsummer 
report; and (3) acreage planted and harvested, 
j^gleased in the December Annual Crop Production 
Summary. Fall-sown rye and winter wheat depart 
from this sequence, with seeded acreage estimated 
in December of the year preceding harvest, and 
winter wheat acreage for harvest in May of the 
next year . ” 

"The total harvested acreage of many crops is brok- 
en down into utilization groups. For example, al- 
though the major use of corn and sorghum is for 
grain, sepai'ate estimates are also made for the 
acreage harvested for silage and for forage, in- 
cluding acreage grazed or hogged." 

"In general, acreage estimates are based on two 
types of information; (1) acreage data for a 
given crop season, obtained from the quinquennial 
census of agriculture, state farm censuses, or some 
other complete or nearly complete enumeration; and 
(2) indicated acreages obtained by questionaires 
from samples of farms or processing plants!’ 

"Major national surveys to collect data on acreages 
of field crops and some seeds and vegetables are 
conducted annually around March 1, June 1, and 
during the fall. The March survey is in large 
measure a nonprobability mail survey, whereas the 
June and fall surveys are based upon both mail and 
probability samples. Acreage utilization and pro- 
duction data are also obtained for a number of 
major crops on the fall survey." 


From our point of view, an important feature of the 
ESDA/SRS methodooogy is the use of multiple-frame sampling. 
Part of the sample used to prepare crop acreage estimates for 
major crops is obtained from the list frame , the other part 
from an area frame . The latter is a probability sample, while 
the former is not. There are some farms which are unavoidably 
included in both frames. Provided that the overlap portion is 


2-4 



identified this does not cause any problems of estimation. The 
expansion of the overlap portion of the sample must be under- 
taken separately to give the proper weights to these units. In 
addition to being multiple- frame, the survey design is at the 
same time stratified. The stratification is obtained by di- 
viding each state into strata according to intensity of agri- 
culture; then each stratum is futher subdivided into sampling 
units of variable size (about one square mile for very intensely 
cultivated land) . 

USDA uses aerial photography to construct area 
frames. The photographs are updated on approximately a 5-year 
cycle. Recently, the use of LANDSAT data has been proposed 
in the framing of the area sample.* Clear delineation of bound- 
aries of fields is necessary in constructing the area frame so 
that enumerators can identify these fields on the ground cor- 
rectly. The USDA evaluation of this application of remote 
sensing is expressed in "Scope and Methods of the Statistical 
Reporting Service," Miscellaneous Publication No. 1308. 

It is evident from the work of principal inves- 
tigators in the agricultural crop survey applications area 
that LANDSAT data can be used to construct independent acreage 
estimates for some crops, such as winter wheat, given the 
necessary amount of "training" data for the correct identification 

* Crop Identification and Acreage Measurement Utiliz - 
ing ERTS Imagery , William H. Wigton and Donald H. Von Steen in: 
Third ERTS-1 Symposium (Dec. 1973) pp. 87-92. 

2-5 



of the crop by the classifier system. Further research on the 
classification of agricultural crop areas from LANDSAT data is 
pj^ogressing / and it is not unreasonable to expect that the 
capability to classify most of the major crops correctly from 
cloud-free LANDSAT frames will be proven in the near future. 

This capability might require repeated "looks" at the crop- 
grov>^ing area to achieve an acceptable level of crop classifi- 
cation accuracy. The use of spectral signatures to classify 
crops and pixel counts to mensurate crop acreage is clearly a 
different technology when compared with the current USDA pro- 
gram for acreage estimation. In what way can this new techno-_ 
lo gy be used most cost-effectively to supplement and improve 
the USDA acreage estimates? The interface, as it can be de- 
fined today, is bounded on the one side by the statutory re- 
quirements for the Crop Reporting Board to report timely and 
accurate crop production figures at specified times within a 
limited budget; on the other hand by the uncertainties and un 
resolved issues concerning the application of remote sensing 
techniques using LANDSAT data to large area crop inventorying. 

In the research environment, where timeliness is not 

a major factor, high accuracies have been reported for LANDSAT- 

based independent crop acreage estimates within narrowly defined 
limits of cartographic area and time of year . * These findings 

*See, for example. Agricultural Inventory Capabili- 

ties of Machine Processed LANDSAT Digi tal Data by Dietrick, 
Egbert and Fries at NASA Earth Resources Survey Symposium, 

June 1975 (Houston) . 

2-6 



relate to few crops and are not yet extended to statewide or 
national crop acreage estimates. Whether it is feasible to do 
s6 with the existing technology is still an open question. The 
promising aspects of the LANDSAT application appear to reside in 
the following points: 

• LANDSAT possesses the capability to supply multi- 
spectral images of a very large agricultural area 
in a short span of time. 

• LANDSAT data are objective. 

e LANDSAT data are usually current, within the crop 
cycle of the year of study, subject to cloud-free 
scenes being obtained. 

• LANDSAT images will be most likely amenable to 
automatic interpretation and, through advanced 
processing techniques, will most likely generate 
cropacreage estimates of high accuracy within -a 
short timespan after data acquisition. 

All of these considerations provide justification 
for an intensive effort to do research and develop cost-ef- 
fective techniques for using LANDSAT data in the crop acreage 
estimation program of the United States, either through direct 

utilization by USDA or by another Federal agency acting in 
concert with USDA. The interface itself can be more sharply 

defined only by pursuing such investigations. A valuable 
beginning is found in the LACIE effort, which will undoubtedly 


2-7 



reveal further promising achievements and, perhaps, also limita 
tions to the scope of LANDSAT applications to crop surveys. 

2.1.2 Yield Integration 

The USDA yield program is described briefly as 
follows by the Agriculture Handbook No. 365: * 

YIELD AND PRODUCTION 

"Yield refers to production per acre measured 
in units such as pounds, bushels, hundredweight, and 
so on, whereas production relates to total units pro- 
duced. Forecasts and estimates of yields and quan- 
tities produced for crops are usually provided as of 
the first of each month during tlie growing season. 

The preponderance of the forecasts and estimates 
fall within the period July 1 to December 1, but for 
crops not in season during this period, primarily 
vegetables, estimates are timed appropriately." 

"Forecasts and estimates are two distinct con- 
cepts. Forecasts refer explicitly to expectations 
of what is likely to be accomplished at some time in 
the future, such as -a prediction of the yield or pro- 
duction of an immature crop. Estimates generally 
refer to a measure of accomplished fact, such as crop 
production at or after harvesttime. " 

"it should be clearly understood that a forecast 
is a statement or report of the prospective yield or 
production, on the basis of known facts on a given 
date, assuming weather conditions and damage from in- 
sects or other pests during the remainder of the grow- 
ing season will be about the same as the average of 
previous years. Potential based on current conditions 
may be appraised accurately, but if weather or other 
conditions change, the actual outturn may differ some- 
what from the forecast. As a crop develops, crop 
reporters periodically submit appraisals of probable 
yield or production on their farms and in their 
localities, and the averages of these reported data 
are translated into forecasts by the Crop Reporting 
Board." 

"Monthly forecasts and end-of-year estimates for 
several crops in many States are also based on objec- 

^Major Statistical Series of the U.S. Department 
of Agriculture," Vol. 8, May 1971. 


2-8 



tive yield survey data. In the objective yield sur- 
veys, trained enumerators visit selected fields and 
orchards chosen on a probability basis to make counts 
and measurements of plants and fruit characteristics 
on small plots located in sample fields or in sample 
trees. This is done during the growing season for 
indications of the probable final yield when the crop 
is mature and harvested. At harvest time actual 
yields in the sample plots are measured, and sample 
plots are gleaned after harvest to measure harvesting 
losses. From these sample results, forecasts and 
actua3. yields are computed along with sampling er- 
rors and these are made available to the Crop Report- 
ing Board for making estimates." 

"When final survey indications and all check 
data for a crop become available, usually some months 
after completion of harvest, the official estimates 
of production are reviewed and revised, if necessary. 
Annual revisions are scheduled in advance and are 
released at essentially the same time every year." 

The determination of the expected yield per acre, 
even for such a widely studied crop as wheat, is a complex and 
difficult task. There are numerous factors affecting plant 
growth , and the use of models to obtain regional (state) or 
specific (local) predictions of yield is far from being per- 
fected. For a detailed review of the issues we refer to the 
Goddard Task Force on Agricultural Forecasting (GTFAP),* selec- 
tions from which are reproduced in the Appendix to this report. 

Some of the difficulties relate to the complexity of the re- 
lationship between yield and the crop growth factors. Other 

difficulties are met in the data collection area. Meteorolog- 
ical data, already being collected by satellites, can provide 


* The Use of the Earth Resources Technology Satellite 
( ERTS ) for Crop Production Forecasts~ Draft Final Report, 

Task Force on Agricultural Forecasting, edited by D.B. Wood, 
NASA Goddard Space Flight Center, July 24, 1974. 


2-9 



some inputs for AGROMET yield determination models (see GTFAF) . 
j-j- appoajTS likely that crop stress factors whxch limit yield 
can be detected* and measured by LANDSAT • Further assistance 
to the yield program may be provided from LANDSAT images by 
detection of crop abandonment. According to our literature 
survey, to date no demonstration has been made of a capability 
to measure the yield per acre of a crop from high—altitude 
remote sensing data. Numerous studies indicate that valuable 
inputs to yield estimation models may be obtainable from satel- 
lites, particularly the weather satellites, but also including 
LANDSAT. For the present purpose, the interface must be 
characterized by those factors, related to yield, which are 
partially or wholly measurable by analysis of LANDSAT data. 

2.2 THE INTERFACE IN 1980-1990 

Anticipating the evolution of a satellite-based 
remote-sensing applications system for crop surveys , in the 
manner described previously (Section 1.3) , there is a different 
perspective of the interface. If one postulates an operational 
system for automatic classification of agricultural crops in 
all geographic units of the United States from satellite 
remotely sensed data, with the concomitant acreage mensuration 
of high accuracy , available on a 24— 4 8— hour basis , the user 

* Wheat; Its Growth and Disease Severity as Deduced 
From ERTS-1, E.T. Kanemasu, C.L. Niblett, H. Manges, D. Lenhart, 
M.A. Newman in Remote Sensing of Environment 255-260 (1974). 


2-10 



agency would be able to use this information to replace 
older and less cost-effective survey techniques, as well as 
to derive new information products at the local level. 

While we hesitate to predict which techniques might be replaced 
or which new products created, the conclusion, as far as the 
interface is concerned, must be that such a system could become 
an integral part of the crop surveys after 1980, rather than a 
superficial addition to the multiframe survey system of today. 

Beyond integrating acreage estimation data from 
LANDSAT and successor systems into the crop surveys, there 
remains a host of potential applications which may provide 
early warning information on crop conditions or survey informa- 
tion on other aspects of agricultural activity. These applica- 
tions would need to be handled individually with due consideration 
for user demand and institutional charter, but we will not 
attempt to pursue the topic any further than that. Some of 
them may prove suitable for commercial exploitation; others 
may require new agency arrangements; still others may fit into 
the organizational framework of existing agencies such as 
USDA/SRS . 


2-11 



3. CRITICAL REQUIREMENTS OF THE INTEGRATION 


3. 1 Sample Design 

The sample of area segments within the U.S. agri- 
cultural lands which are to be registered, classified and 
measured by processing LANDSAT data can be considered as a 
mechanism for selecting a manageable portion of the vast amount 
of data acquired. Processing of all relevant* data in a timely 
and cost-effective way is an option to be evaluated. This pro- 
vides a census of the agricultural land, but it is not a total 
census in that some areas will be excluded by cloud cover, and 
fields which are too small for the classifier are also lost. 

A scientifically designed statistical sample of the agricultural 
land is an alternative option which is likely to prove cost- 
effective. Design criteria of the sample, which should be talcen 
into account are: 

(1) the size of the region for which the sample is 

intended: U.S. nation, 48 coterminous states, 

one state, crop reporting district, coiinty, etc., 

(2) the intensity of agricultural activity relating 
to the crops of interest, 

(3) the probability of obtaining a cloud-free LAND- 
SAT frame, or sufficient cloud-free area within 

*Obviously data pertaining to cities, mountains, 
lakes, deserts, etc. can be excluded. 

3-1 



the f rarae , 

(4) the number of LANDSAT passes which must be used 
to construct the sample, 

(5) the acceptable level of sampling error, 

(6) the need for training sites for the classifier 
within the sample segments. 

Some of these points, such as (1) and (5), relate to 
objectives of the survey. Others, such as (2) and (3), relate 
to the physical state of the region and its atmosphere at the 
time of the survey. The remainder, (4) and (6), relate to the 
techniques used for registration, classification and mensuration 
of crop acreages. Each of the issues - survey objectives, phys- 
ical state of the environment and measurement techniques - 
must be resolved fully at the time of survey implementation. 

One of the technical issues to be resolved concerns 
the use of agricultural fields as an integral part of the acre- 
age classification and measurement processing of LANDSAT im- 
ages. The choice of technique in this area has some bearing 
on sample design since efficient sampling and estimation v^ould 
require knowledge of field size distribution if fields are used 
as a structural basis for crop classification. The following 
table presents a brief overview of the comparative advantages 
of two methods. 

The sample design itself can be undertaken without 
undue difficulty once the major issues outlined above have been 


3-2 



I 


Table 3.1 Comparison of 
1 Techniques for 

Remote Sensing 
Crop Classification 

Field Classifier 

Pixel Classifier 

Lack of knowledge of field 
size distributions 

Field size distributions 
not needed 

Fields are useful for class- 
ification of crops in that 
they provide spatial context 

Clustering of contiguous 
pixels can be done to a 
limited extent - some of 
the spatial context is lost 

Reduction of database size 
by using fields 

Simpler structure of a 
"coordinate grid" database 

Variations within fields can 
lead to increased mensuration 
error if they are not fully 
accounted. (e.g., small 

ponds, bare patches, etc.) 

Classification of isolated 
pixels may cause bias in 
estimates - fractional pixel 
classification is difficult 


resolved. Following accepted survey techniques, one would 
stratify the population with strata defined on the basis of 
known agricultural practices and crop calendars. Each stratum 
would contain, for instance, a geographically contiguous area 
containing a more or less known amount of activity relating to 
the crops of interest. The segments or sample units would be 
selected from within each stratum by one of two standard meth- 
ods, sample size proportional to strata size (sampling the 
same fraction of each stratum) or optimal allocation, taking 
into account the variances of the measurements within strata 
and the "cost" of sampling if any differences occur between 
strata. 


3-3 


An additional criterion which might be employed in 
the sample design is cloud cover. The samples should be selected 
to increase the probability of obtaining cloud-free samples from 
areas which are frequently obscured by clouds, and these samples 
should be weighted to reflect the relative scarcity of cloud- 
free conditions. In order to do this it will clearly be neces- 
sary to develop database on regional cloud statistics for time 
of year. This issue will be reviewed in the next section of this 

report. 

The total size of the sample will be determined by 
the economics of data acquisition and processing in relation to 
the objectives of the survey. If there is an institutional 
requirement to achieve a predetermined total error level, for 
example if the objectives of the LANDSAT application include 
obtaining a total crop acreage estimation error no larger than 
the currently existing value, then one may control the sampling 
error, E , in relation to the measurement error, Ej^, to achieve 
th.is total error level: 

3 . 2 Missing Data Due to Cloud Cover 

Cloud cover can present a satellite remote-sensing 
applications system with a critical problem. In the case of a 
crop survey using sampling with fixed-area segments on a spec- 
ified date, the presence of cloud cover causes loss of signif- 

■3-4 : 



icant quantities of data, possibly all data on crops within the 
seqnients. If the sainplinq is timed to capture LANDSAT images of 
agricultural areas at a particular point of the crop cycle, 
this loss may severely reduce the overall quality of the sample. 
The results of crop acreage estimation derived from the sample 
may suffer from two forms of distortion due to cloud cover: 

9 The sample may be biased, due to the unrepresen- 
tative nature of the cloud-free portions of the 
sample for which data was actually ' obtained, 
e The sample may result in too high a level of 
sampling error due to the effectual reduction 
in sample size by the cloud cover problem. 

If it is possible within the time frame of the sampling pro- 
cedures, repeat observations on a later date should be obtained 
to minimize these distortions. Otherwise, there are two main 
alternative "safeguards" against distortion due to cloud cover; 

© A sample using "floating" rather than fixed area 
segments selected from the cloud— free portions of 
the images . 

e A sample that is overdesigned so that a cloud-free 
sub sample can be selected as necessary. 

Neither of the alternative safeguards guarantees a 
satisfactory solution 100% of the time, although experience may 
show that one or both of them work well enough to provide sta- 
■^Ig"tically acceptable results. It is also clear that, from 


3-5 



LANDSAT survey data alone, crop acreage estimation for the 
smaller geographic units, e.g., counties, can be rendered in- 
feasible by cloud cover if the data are narrowly limited in time. 
Whenever the data are obtained from several passes of LANDSAT 
the cloud cover problem is greatly reduced, and it is possible 
to calculate the minimum required number of passes to obtain 
a desired confidence level for the crop acreage estimate in 
each geographic unit. The nature of this critical problem is 
therefore one which allows solution only after the techniques 
of crop classification from the LANDSAT data have been formally 
specified. These technique specifications must be either: 

© time- insensitive within a wide range of the 
crop growth cycle, or 

• based on a sample design which explicitly recog- 
nizes the existence and geographical distribution 
of cloud cover at the time the sample is obtained. 
For the latter purpose, a detailed study of cloud statistics 
would be required on a current basis for the time of year and 
geographic region of interest* The 1969 Study, "Cloud Statistics 
in Earth Resources Technology Satellite (ERTS) Mission Planning" 
by Vincent V. Salomonson provides seasonal frequencies of 30% 
or less cloudiness for the contiguous 48 states. Further detail 
would be required to design cropland samples which recognize 
cloud cover probabilities explicitly. 


3-6 



u» 

1 



Figure 3.1 Four maps of the United States showing the frequency in percent of 30% 
or less cloudiness at 35 stations and the general locations where the 
probability is > 0 . 8 , 0.5-0 . 8 , and < 0 .5 of seeing 30% or less cloudiness 
on at least 2 out of 5 passes during a season. The frequencies shown 
were compiled for the four seasons by Smith and Shafman (1968) and are 
based on ten years of record at each station. 

Source: "Cloud Statistics in ERTS Mission Planning" by V. Salomonson, GSFC, 1969 



3. 3 


Comparability of Satellite and Ground Survey Data 


There are two major differences between remote 
sensing surveys of crops and conventional surveys employed by 
USDA. 

(i) The sampling of farms or fields is based on 

totally different "frames."* ■ 

(ii) The timing of LANDSAT data acquisition is 
significantly different compared with the 
USDA conventional surveys. 

We will deal with each of these separately in this section as 
applied to the estimation of crop acreage. Discussion of inte- 
grated yield programs presents far more difficult problems be- 
cause of the complexity of the yield prediction models. 

3.3.1 Different Sampling Frames 

In one sense the difference in frames and sampling 
units between a LANDSAT survey of agricultural areas and a con- 
ventional enumeration or mail-out survey is no problem because 
the USDA already uses a multiple-frame approach. However, when 
one considers in detail the integration of the LANDSAT and con- 
ventional surveys, one is faced with a critical requirement; 

© to statistically combine acreage from the LANDSAT 

*We are not referring to LANDSAT image frames of 100 
n.mi X 100 n.mi, but to the sampling frame which provides an 
operationally useful definition of the population to the statis- 
tician who must define the procedure by which samples are to be 
selected from the population. 


3-8 



data with acreage from the USDA enumerative 

and mail-out surveys, one must be able to specify 

how the LANDSAT acreage was sampled. 

This requirement is not critical if t 

(1) area segments are cartographically defined as a 
sampling frame, and 

(2) LANDSAT images are registered with respect to 
those segments, and 

(3) a probability sample of the segments is selected 
for crop acreage classification and mensuration. 
In this case, the results of the LANDSAT acreage 
survey can be statistically integrated with the 
results of the USDA enumerative surveys and mail- 
out surveys using standard techniques - essen- 
tially a v/eighted averaging procedure with the 
weights determined in relation to the standard 
errors of the estimates that are obtained from 
the several sources of information. However, 

if any one or more of the steps (1) - (3) out- 
lined above are not followed , for any reason, 
then integrating the survey results may be 
difficult. 

The estimates of crop acreage which might be obtained 
independently from LANDSAT data have different statistical 
Gharacteristics from estimates derived by ground surveys. Apart 


3-9 



from the classification errors - such as confusion of similar 
crops - and the cloud cover problem, they differ substantially 
with regard to sampling errors. The total error of estimation 
derives from several sources, only one of which is sampling 
error. In USDA crop surveys based on enumeration of crop 
acreages within area segments, the measurement error is very 
low ( 0.5%), while the sampling error is much larger due to 
the small fraction of total area sampled. When LANDSAT data 
are processed for estimation of crop acreages, the measurement 
error becomes a combination of several factors and is likely 
to be larger than USDA enumerative crop surveys . On the other 
hand, the sampling error will be reduced because the fraction 
of croplands sampled can be substantially larger than existing 
surveys. Integration of LANDSAT data with USDA crop survey 
data should be planned to take advantage of one of the main 
virtues of LANDSAT images: their large area coverage. Needless 

to say, the information in independent estimates of crop acreage 
could be used in other ways to: 

o check other survey results, 
o develop new schedules of crop reporting , 

® monitor progress in planting or harvesting. 

From the economic studies of remote sensing satelli'-.es it does 
not appear that these other uses would be cost-effective ^ 
themselves . Once the system is developed for the agricultural 
crop survey mission, however, a list of minor applications 
become incrementally justifiable . 


3-10 



3.3.2 Different Timing of Acquisiti-ons of Survey Data 

The 18-day repeat cycle of each LANDSAT satellite 
permits, in principle, frequent updates of crop acreage esti- 
mates when compared with the reporting of crop data currently 
obtained by USDA. However, there are several factors which in 
practice will reduce the update frequency considerably: 

» classification of crops from LANDSAT images 

with acceptable error levels may require multi- 
temporal data, 

• several repeat observations of the same area 
may be needed to obtain sufficiently cloud- 
free scenes, 

e some crops v/ill only be identifiable or distin- 
guishable from other crops at a particular time 
of year in LANDSAT images. 

Perhaps the most positive statement that can be 
made about the LANDSAT frequency of data acquisition at the 
present time is that it provides an opportunity to obtain some 
crop acreage estimates on a monthly basis at state and perhaps 
even county levels. While these V70uld not be complete , they 
would provide a new agricultural information service based on 
LANDSAT images. Whether or not these monthly regional crop 
acreage estimates would be immediately integrated with USDA/SRS 
preliminary survey results, or held until the completion of the 
annual crop survey , they would serve as a basis for improved 


3-11 



crop forecasting. The method of improvement would be either 
through independent preparation of new forecasts based on LAND- 
SAT results, or through integration of those results with USDA 
crop survey data. 

3.4 Us es of Ancillary Data in LANDSAT Applications 

to Crop Surveys 

Due to the special nature of the LANDSAT image 
analysis procedures for classifying crops and mensurating crop 
acreage, there is a need to use considerable ancillary data to 
assist the classifier and to achieve maximum precision in the 
results. There xs (potentially) a critical reguxrement in 
this matter due to the large amount of current agricultural 
data v;hich the multi-spectral image analysis system v?ould re- 
quire. If one employs automatic (computerized) classifica- 
tion, which is considered essential for a cost-effective 
operational system, the ancillary data must be organized in a 
computer databank and retrievable by the classification pro- 
grams. This will require a substantial amount of coding and 
input of the ancillary crop data to keep the data bank current 
and in general to maintain it in usable form. In summary : 
the planning and organization of a databank containing up- to- 
date agricultural crop information with data such as local 
planting times will be a critical requirement for the inte- 
gration of the LANDSAT crop survey applications with USDA crop 
surveys, ^ ^ 

* See Appendix C for a discussion of the issues concern 
ing the use of crop calendars to assist in the task of remote 
sensing identification of crops. 


3-12 



4. 


CONCLUSIONS & RECOMMENDATIONS 


4 . 1 Conclusions 

The use of LANDSAT in U.S. crop surveys has signi- 
ficant potential benefits if the accuracy and timeliness of 
existing crop production estimates can be improved significan- 
tly thereby. To achieve the goal, it is necessary that a 
qualified organization should receive the LANDSAT crop survey 
information and integrate it with crop information obtained 
by other methods. So long as LANDSAT supplies only the acre- 
age component of a U.S. crop production estimate*, there is 
a substantial body of agricultural data which would be requir- 
ed in addition to LANDSAT data. At the present time, only the 
USDA has the independent capability to acquire, process and 
integrate all of these data into a timely and accurate crop re- 
port. The development of the remote sensing capability in 
agriculture into a crop reporting system requires expertise far 
beyond the classification and interpretation of LANDS7\T data 
on crop producing areas. We feel that technological improvements 
in crop survey should be pursued in full cooperation ’vith 
USDA and should have full support from existing USDA bureaus 
and institutions for preparation of crop reports in order to 
achieve maximum public acceptance and economic usefulness. 


*Production=Acreage x Yield per acre 


4-1 



Progress in the deyelopinent of automatic processing of LANDSAT 
data may lead eventually (say in the 1980 *s) to an independent, 
stand-alone system for crop reporting. However, even this con- 
clusion is doubtful and based only on certain broad assumptions 
about the new technology rather than demonstrating facts. 

In global crop surveys, the situation is more com- 
plicated due to 

(1) the incompleteness and inaccuracy of much of the 
existing crop data for foreign countries, and 

(2) the scale of the global survey task; complete 
and accurate crop reports for worldwide agri- 
culture would require many times as much data 
processing as U.S. crop surveys. 

Integration of LANDSAT data into foreign agricultural surveys 
should be pursued with the cooperation of USDA/FAS, while 
research is in progress to develop successful techniques to 
extract crop acreages and yield indicators from LANDSAT data. 
Obviously, much has to be learned before one can confidently 
predict a global crop survey capability using LANDSAT (or any 
of its successors) as the prime data source. We have concluded 
that the integration of satellite and ground data on worldwide 
crop production should be vndertaken only after the successful 
demonstration of advanced interpretation techniques for re- 
motely sensed data on agricultural areas outside the U.S. and 
Canada . 


4-2 



4.2 Specific Recommendations on Integration of Data 

4.2.1 Techniques 

Development of techniques to select , classify and 
inensurate a statistical sample of LANDSAT data on crop produc- 
ing areas in the O.S. must be continued. Expansion of the 
sample results to provide an estimate of the crop production 
for the reporting region - whether that is county, state or 
nation - must be scientifically researched. In addition to 
the geographical considerations of sample design, the problems 
of timing of data and selection are critical, particularly in 
the presence of cloud cover. 

We recommend that NASA should promote research on 
the following technical issues relating to the U.S. crop survey 
application of LANDSAT: 

© overcoming cloud-cover problems on the sampling 
of relevant U.S. crop data from the LANDSAT 
data resource , 

© the development and updating of the databank of 
"ancillary” agricultural data (i.e., not remotely 
sensed) is required for automatic processing of 
remotely sensed crop data into meaningful crop 
production estimates, 

© the sampling of LANDSAT data for efficient 

statistical inference on national and regional 



(state and county) crop production - stratified, 

multi-frame samples in relation to variety of 

cropping practice, and time of year are expected 
and 

• the accurate cartographic registration of LANDSAT 
images to allow for easy comparability of the 
LANDSAT intepretive results with existing USDA 
crop survey results. 

4.2.2 Evolutionary Approach to Integration 

We take the position that there are advantages, 
both technical and economic, to an evolutionary staged approach 
to the integration of LANDSAT crop data into the crop reporting 
system. A possible scenario for this solution is presented for 
illustration of the method: 


Stage lA 

Develop statistical and data processing techniques for using 
LANDSAT data to obtain state and national crop acreage 
figures for a few selected crops in the United States. 

■ Stage IB 

Develop crop yield models and associated inputs for yield 
measurement from LANDSAT data. Explore the feasibility of 
obtaining an accurate crop yield measurement system using 
LANDSAT data to provide local crop condition data in each 
crop reporting district (CRD) or other regional subdivision. 


4-4 



stage IIA 

Use new crop acreage estimates 
from LANDSAT together with USDA 
crop survey data in an integrated 
crop reporting system. 

Stage I IIA 

Develop new agricultural informational services based on daily, 
weekly or monthly regional surveys of crops from LANDSAT data 
e.g., planting progress reports (acreage), harvest progress 
report (acreage) , crop stress warnings (yield factor), crop 
condition assessments during growing season (yield factor) . 

Stage I I IB 

Develop a new crop survey system integrating fully the satel- 
lite data with ground data and replacing older, less cost- 
effective survey techniques V7ith sate3.1ite remote sensing 
techniques. 

The logical relationship between the stages is 
indicated in Figure 4.1. The branch ending at IIIA describes 
an integrated approach to the use of LANDSAT imagery for crop 
acreage estimation based on low accuracy of the LANDSAT crop 
survey results. The other branch refers to an independent 
LANDSAT-type system for crop survey based on high accuracy of 
survey results. Stage IB develops inputs to yield prediction 
models and is independent of the acreage developments 



4-5 







APPENDIX A 


the measurement of the yield component* 


The Deter minants of Wheat Yield 

This section will provide an overviev/ of the primary factors which 
impact upon yield. Later in this study we will illustrate which of these ^actors 
are contained in yield models. , 

The factors affecting plant growth are numerous and complex and 
their affects vary with the growth stages and the time of planting. 
physioloaists have defined more than a dozen stages in plant growth wnon 
obLr^^ations and measurements can be made. Most of the literature consu.te 
in this study referred to from six to nine stages. Two commonly used k y 


grow 

th stages are 



Growth Stage 

Growth Stage 

a • 

Tillering 

Seedling (emergence) 

b. 

Early Joint 

Tillering (5 or <5 leaves) 

c. 

Late joint 

Tillering (> & leaves) 

d. 

Boot . 

■ Jointing 

e. 

Heading 

Boot 1 

f. 

Anthesis . •' 

Heading (5 0% of head out) 

9. 

Berry 

Flo v/e ring 

h. 

Milk-Soft Dough 

Dough 

i. 

Ripe j 

Ripe 


The stages which have been most widely used as growth parameters are 
emergence, heading and ripe. There is considerable year-to-year variation 
in the time of occurrence of each growth stage as well as the aegioe o 
plant development in each stage caused by environmental ana strategic 

factors. These, in turn, determine variation in ultimate wheat yiela. 

5 Takei TTrom "The Use of the Earth Resources 

for crop Production Forecasts", Draft Final Report of tne Task Force 
on Agricultural Forecasting, Goddard Space Flight Center . 

July, 1974 


..utiMAL PAGE iS 
IDP POOR QUALITY 


A-1 


Figure 1 depicts the interrelationships between the various elements 
which determine wheat yield. The final yield will be determined by both 
jgrowth factors and by factors which cause crop abandonment (i.e. , failure 
to harvest the crop). 

Growth Factors 


The factors that affect the growth of wheat can be di- 
vided into those which are deterniined by environmental factors 
and those that are related to strategy options available to 
farmers. The environmental influence consists of a number of 
factors including soil characteristics, temperature, moisture, 
light, wind and carbon dioxide. Each of these will now by 
briefly discussed: 

Soil 

Soil is a physical medium for plant grov/th and provides moisture 
and nutrients to crops. On the other hand, it harbors insects and diseases 
which can attack plants. The physical quality of soils which are measured 
by suc|h items as texture, permeability, available water capacity, liquid 
limit/ the piasticity index, density, acidity-alkalinity reaction , and chemical 
properties (e.g., organic carbon percentage, electrical conductivity, calcium 
carbonate equivalent etc. ) can im.pede or facilitate the movement of water and 
certain nutrients such as nitrate and sulfate ions . Because of thier com- 
plexity, .many of the properties of soils and their interactions with plants have 
not been quantified. However, it is known that the above-mentioned factors 
affect most of the stages of plant gro%vth and ultimate yield. 


Temperature 

Air and soil temperatures significantly affect wheat at various 
stages of plant growth. Seeds will not germinate if the soil temperature is 
below i40^-45^ F. Cooler temperatures usually cause slower growth. The 
maturities of various plants are determined largely by degree-days. 

Moisture • • ' 

Moisture is the most commonly discussed environmental factor 
in tltc literature. The amount of soil moisture at seeding time, the seasonality, 
frequency and duration of rainfall during the season as well as the total 
seasonal amount all significantly affect plant development. During the 
growing season , -plant roots take m.oisture from the soil and transpire much 
of it back to the atmosphere through the leaves. When soil moisture falls 
below the wilting point for that soil, the plant becomes moisture deficient 
and further development is retarded. Decreased yield or plant death could 
follow. Water accumulating on the surface of the soil can delay planting 


Ql? POOB. 


A- 2 



ordrown or retard the growth of already planted seeds. Heavy rains on . 
goivving plants can also canse lodging. Lodging can cause plant maturity 
to be delayed, takes longer to combine-harvest, and can result in he 
sprouting of kernels that are in cpntact with the ground . , 

IJqht 

Light is the catalvst necessary for.the conversion of carbon dioxide 
and water into sugars, protein and ultimately, yield . latitude and inte^ 
of sunlight are the primary factors. Latitude affects day-length ana bo _ 
short-wave (solar) and long-wave (terrestrial) raaiation are correlated wi^ 
cloud cover . Rates of photosynthesis depend upon the receipt or visible light 
and rates of transpiration are affected by the net exchange of radiation by tne 

crop canopy . 

There is little man can do, at the present time, to control 
dav-lenath. However, wheat growers can modify the amount that strikes 
each leaf plant by adjusting seeding rate, distance between plant rows 
and distance between plants and by breeding new seed varieties with 
nearly upright leaves in order to minimize shading and maximize the 
amount of leaf area exposed to sunlight. 

Wind 

The major effect'of this variable is in causing lodging of wheat plants, 

This could delay ripening and cause problems in han^estlng. 

Harbon Dioxide • • . . . 

This gas is needed by plants to carry on photosynthesis. Experiments 
have shown that increasing the atmosphere’s concentration of this ga^s aboye_ ^ 
normal levels increases dry matter significantly. Thus, the composition of th 
atmosphere will affect wheat yields. 

Some methods man could use to modify these environmental factors 
Include: . 

Irrigation is used to augment natural .precipitation. 

• * The importance of the proper amount of soil moisture 

hoth before seeding and during growth has been dis- 
cussed above, 

b. Fertilization - comm.ercial fertilizers supplement soil 
nutrients in more than half the wheat fields. The ^ 

dryer the area, the less fertilizer is used. A defficiency 

.of each of essential mineral, elements required for 
plant growth results in a specific change in color and/ 
or shape of the plant. In general, partial lack of a nutrient 


■ ■ PAftW 

0® POOE ^'-3 


causes a plant's loaves to turn some shade of yellov/ and 
results in a shorter plant with lower yield. 

C, Planting practice s' include depth of planting , plant 

spacing and date of planting. Farmers adjust the depth 
of planting according to soil moisture and temperature. 

As a goneral'rule , the cooler and moister the soil the 
closer to the surface the seed is placed in order to 
provide maximum yield. 

Plant spacing affects time of covering the ground, 
weed incidence, available moisture supply and the 
amount of leaf area exposed to sunlight and ultimately 
yield. Research and farmer experience have provided 
management with the knowledge to consider these 
factors with a view toward obtaining the highest 
• possible yields. * 

Generally, the earlier the date of planting of spring 
wheat, the higher the expected yield. However, this 
Is cohstrained by soil temperature and moisture and the 
probable amount of danger from frost for the emergence 
plants.* Planting of winter wheat will generally wait 
for an adequate level of soil moisture and consider the 
danger of Hessian fly. 

Crop pattern alterations prevent water and nutrient supplies 
of the soil from being depleted. For example, summer 
fallov/ing is carried on in order to store up the years 
rainfall and accumulate nitrates, 

di Herbicides, insecticides and pesticides are used to 
control weeds , insects and diseases. Weeds, v/hich 
can dim.inish plant population and cause water deficiency 
can be controlled via lierblcides and are less of a 
■ problem than diseases and insects which can cause 
decreased yields or complete crop loss. 

. e. New seed varieties are used to take advantage of 
genetic differences among plants . These gene<:ic 
• differences account for differences in the way in which 
different plants react to environmental factors . Thus, 
seed bresiders are continually developing varieties 
with varying characteristics of yield potential. 


A- 4 





disease resistance, insect resistence , plant height, 

■ ■ stalk strength, length of growing season, drought 

resistance leaf conformation, root conformation and 
winter hardiness. 

As will be seen below, accounting for all these factors simultanecus 
present a serious problem in any analysis of the causes in variability of crop 
yields. 


CROP ABANDONMENT FACTORS 


, , ♦ 

Given that one could perfectly model the growth factors, it is still 

necessary to consider those factors which might lead the farmer to fail to 
harvest the crop. These can be patterned into natural factors which cause 
the crop to fail and economic factors which influence the farmers. These 
factors include a) drought which, although it is at least partially accounted 
iov in consideration of precipitation deserves mention here since it is suen 
a serious problem in some parts of the world, bl wind, hail, winterkill and 
crop disease which are generally difficult to forecast and not included as 
explanatory variables in any of the models consulted in this study, c) 

Insect damage which might be mitigated by the use of pesticides. Note that 
the environmental failure effects.'produce significant reductions in the 
theoretical yield produced by existing models and that the occurrence of these 
events are potentially detectable from space. Thus, a dramatic improvement 
In yield prediction could be realized by including these factors in an over- 
all yield model. 


The economic impact on crop abandonment is relatively straight- 
forward but is not considered in the yield models discovered during the 
literature search. The current price of the crop, the cost of harvesting the 
crop and the government support in the form of crop insurance combine to. 
provide bad e-off decisions for the farm.cr. Planting of Vv^inter wheat for 
forage and/or soil protection with the intention of plowing it under in the 
spring is a faiily wide spread practice which if unaccounted for coulc 
to serious bias in estimated yield. In recent times the dramatic increases 




In wheat price have in some cases led to a harvesting of crops which were 
originally planted for forage purposes. Thus, if forage is considered in a 
md^el, a potential for misspccification in the other direction exists. 


ANALYTICAL APPROACHES USED IN PREVIOUS STUDIES 


The documents reviewed at this wmiting were written for a variety 
of audiences, on a variety of topics, used different techniques of analysis 




A- 5 



and contained differing attitudes and assumptions toward crop yield fore- 
casting. Some yield forecasting models were built for the purpose of estimatinc 
the effects of variation of a single policy variable such as irrigation. Other 
models are concerned v/ith determining the relative effects of several difisren. 
variables that arc known to affect crop yield and thereby understand the 
structure of the causative factors leading to crop yield. Still other studies 
estimate a model for the primary purpose of predicting yield. The great . 
majority of the models studied are concerned m.'ore v/ith effects of individual 
factors and policy determination than they are v/ith forecasting. 

The techniques used in previous studies include: 

a. Regression analysis of local, state and national data 

b. Regression analysis of visual quantification of crop 
conditions for specific localities. 

'c. Observations of crops under controlled environment 

d. USDA surveys of farmers . - 

e. Parametric time-series analysis 

f. Estimation of formal production functions, • . ' 


REGRESSION A1\V\LYSIS 


Regression analysis is the technique used most frequently in previous 
studies. In this section we will discuss the general types of regression stucies 
encountered in the literature review and the difficulties encountered in 
these studies which account for so many unsuccessful attempts at fore- 
casting crop yield. A more thorough background for this discussion appears 
in the reviews of the literature in Appendix 

The Nature of Previous Regression Studies 

The theory behind most e:<isting models for yield prediction appears 
to* be that air composition and soil fertility exhibit little variation from year 
to year by comparison v/ith the considerable fluctuations in a'r temperature and 
water supply. Positive or negative genetic factors and crop abandonment 
factors are rarely explicitly considered. 

Most of the earlier studies related wheat yields on a local or state basi 
*to environmental conditions sben as inches of precipitation or average temperature 
of critical months. One basic problem in these models is their inability to 
account for technological change, especially more recent breakthroughs. A 
typical way of handling this is to use a time trend to represent technological 
change. This assumes some sort of systematic embodiment of technology. 

A-6 




. Another basic problem with some of these models is their use of 

seasonal and even monthly averages of some of these variables. A number of 
subsequent phonological and field studies have shown that there is a gradual 
change of the effect of. weather variables on crop yield development throughout 
the growing season. R. A. Fisher (1924), developed a statistical technique 
for analyzing the daily effect of rainfall at any time during the grov/ing season. 
This technique has since been used and modified by a number of studies, 
especially those involving rainfall as the most critical explanatory veriabio. 

The technique involves the estimation of a function of rainfall as a polynomial 
funefion of a biometeorological time variable. A similar approach is illustrated 
by Baier. (1973). • • 

As indicated abov’^e, there is considerable interaction of causative 
factors. For example, the use of fertilizer might increase the response of the 
crop to additional soil moisture or precipitation. 

For some meteorological variables their interacting effects have 
been partially captured by the development of new weather parameters which 
can be derived from standard climatological data and are related to the way in whi 
plants and soil conditions react to them. Examples of this are such relatively 
new concepts as potential evapotranspiration , heat units and soil m.oisture 
budgeting, lor exam.ple, Ivlack and Ferguso'n (1968) developed a moisture stress 
index for a wheat crop using the m.odulatcd soil m.oisture budget developed by 
Holmes and Robertson in an earlier study. This index is expressed as the 
difference between potential evapotranspiration .and actual evapotranspiration 
and is found to correlate more closely with wheat yields than other water- 
related variables tested, such as seasonal precipitation. Nix and Fitzpatrick (1959 
develop a crop water stress index which accounted for a greater proportion 
provided the best statistical results. However, it is possible that p'oor data 
reporting system.s in Turkey might have made disaggregated data more vulnerable 
to errors. William.s (197 O') estimated yields for each of the crop districts in t.he 
Canadian prairies and e.xtrapolated the results for each province and for the 
Canadian prairies as a whole based upon acreage values, and similarities of 
environmental conditions. Although the national estimates appear accurate 
some district and provincial totals were underostim.ated while others were 
overestimiated the.^eby compensating each other. Probably, if the. errors of the 
individual local estimates were random, an aggregation of m.any local estim.ates 
would result in a lower standard error for the national total than for the local 
estimate. Hov.'over, because of the factors mentioned above, this would require 
different equations for each local area, 

VISUAL OL'ANTIFI CATION OF PIANT DEVELOPMENT 

This technique was developed by Professor J. R. Haun of Clernson 
University, Clernson, South Carolina. A technique was developed whereby 



dally observations of wheat developed was recorded as an index (based upon 
the rate- of development of leaves and other plant parts). This was regressed 
against age, cumulative development and environmental factors and various 
lags, transformations and cross products. The observations were made on 
five wheat plantings in 1966 in Dickerson, North Dakota and the predictive 
equation was tested using 1967 data. The actual and predicted estimates 
appear in close agreement. However, some systematic bias is evicent. In 
a paper due to be published this m.onth, the. author will demonstrate the use o. 
this model in predictions of yields. 

. The apolication of this model to national totals would require 
extensive gathering of morphological data throughout the growing season. 

Chirkov (1973) reports that the Russians have had considerable 

success in forecasting wheat yields by observing physical characteristics 
of plant development. For example, for dark soils, the factors described as 
Influencing v/heat yield predictions in order of primary importance are number 
of stems in the spring, phase of emergence of the stalk, number of ear L'caring 
stems in the flowering phase. A secondary factor is the height of winter 
wheat plants starting from the flowering phase and a tertiary factor is the 
supply of available moisture in the soil layer from 0-100 cm during the ten days 
following the resumption of growth in the spring, 

A confidence factor of 80% for prediction of the yield of winter 
wheat is claimed using only moisture supply, number of stems per m^ in the 
.spring or in the phase of emergence of the stalk and, for a forecast prepared in 
the flowering phase, the number of stems with an ear and the height of the 
plants. Inclusion of secondary factors is said to Increase the confidence 
factor to 90 percent. 

It is stated without backup that equations have been developed which 
forecast the yield of winter wheat with great confidence for individual fields, 
oblasts, regions, republics and for the country as a vvhole. 

OBSERVATIONS OF CROPS UNDER CONTROLLED ENVIRONMENT 


Many studies in which plants are grown under controlled con- 
ditions are referenced in the literature and several have already been 
reviewed at this writing. These include wheat grown in greenhouses or 
on small plots in which almost all factors aare held constant except the 
particular one the experimenter is interested in. The studies that 
have already been reviewed in this effort include those investigating 
the effects on yield of changes in soil moisture, different types of 
herbicides, nitrogen fertilizers, ethral and supplemental irrigation. 
These studies are generally useful in enumerating factors which affect 
wheat yield, but are of too limited a purpose to be used to eliminate 
national crop yields. 








A- 8 



TI.qPA SURVEY TECHNIQUES 


• ' 'A few documents discuss the use of sur/eys in the TJ . S. and 
Australia to forecast crop yield at different times during the grov/ing season. 
Understanding this tochnigue gonorally involves two parts: a description of 
the data collection techniques and a description of the forecasting techniques. 


In the U. si, information is collected by mail surveys, telephone 
contacts, personal interview and observations in selected fields from pro— ■ 
-ducers,, feeders, grain elov’etor operators, and 'exporters . Trus informatior 
Includes acreage intended for planting, planted, intended for harvest and 
har\^ested, expected yields and production, inventories,, em.plc'/rnent and 
v/ages. The results of these surveys are checked for consistency against 
Information collected for the Agricultural Census conducted every five years 
and other relevant data . 


For supplemental inform.ation " an objective yield surv'ey is performed 
In which trained enumerators visit 17,000 sample plots in a sample of fields 
during the growing season to obtain quantitative data of such factors as number 
of plants per plot, plant spacings , number of wheat heads and spikelets, , 

, stage of development, final yield and har\'esting. loss . This iniormation is 
gathered monthly. 


The annual cycle of crop projections begins with a report on farmers 
Intentions to plant. This report is based upon data gathered in the February 
surveys and is published in March. • 


The second major sur\'’ey in early June, when most crops' are 

in tl'je ground, is combined with the June Enumerative Sur\'ey and published 
in the Tulv Creo Rocort along with estimated production during the forecast 
season of August through ilovember. An acreage update survey is conducted 
each July to determine changes that need to be made in June data. This 
first update appears in the August Crop Report. A third survey effort in the 

Fall measures acreage actually harvested. 

The system for estimating yields relies on a "graphic regression 
method" v/hich relates repor-ted crop conditions to a forecast of yield. Crop 
reporters estimate the probable average yield in their localities and the averages 
of these forecasts are translated into yield forecasts by the Crop Reporting 
Board by means of regression charts which -relate historical "true" yeiids to 
reported probable yields. In some states, a regression equation is used to fore- 
cast yield per acre as a function. of a) reported condition of'erop (reported 
yield per acre), b) precipitation for specified months prior to date of forecast, 
c) precipitation for specified months after date of forecasts and e) time. 




A- 9. 



Gunnalson, Dobson and Pamperin (1972) examined the accuracy 
ynoTp thaiVl, IDOUSDA cropproductionfQj-ecasts for barley, corn, oats, 
potatoes, soybeans, spring v/heat and winter wheat for the period 1929-197 0. 

He fouind that USDA forecasts generally exhibit desircable properties based 
upon his criteria. Unsatisfactory first forecasts were divided almost equally 
between those which exhibited turning point en'ors and those which correctly 
Indicated the direction of change but which erred significantly in magnitude 
First and second revised forecasts showed improvement over the first forecast 
lowest percentage of satisfactory revisions were found for Winter wheat (59.5 
and 52.4 percent foi' first and second revisions respectively). Although the 
revised forecasts tended to be successful, they tended to undercompensate for 
the error in the previous estimate. ' • • 

In general the accuracy of first forecasts seem to have shown 
moderate improvement betv.'cen 1929. and 1970; that of the first revisions remained 
relatively constant; and that of the second revisions appears to have improved. 

Although this study revealed no serious inadequacies in crop forecasts, 
the analysis identified a few persistent inaccuracies in the forecasts. Specificail 
USDA tends to: 


a. Underestimate crop size 

b. Underestimate the size of changes in production from .. 

year to year and • 

c. • Undercompensate for errors in previous forecasts when ... 

developing revisions. 

While USDA crop forecasts exhibit desireable characteristics when 
appraised by these criteria it is possible that the levels of some of the fore- 
casting errors exhibited may 'create planning problems for farmers and marketing 
firms. 

PARAMETRIC TIME SERIES ANALYSIS • . 

This technique is b^sed upon two assumptions regarding the factors 
affecting yield. First, it is assumed that the major factor affecting yields - 
• weather - is difficult to forecast and second , the embodiment of technological 
change is hlghlv' correlated Lhrough time. Because of this, an attempt is not 
made tc identify the underlying structural relationships and national average 
crop yield data is used for identifying and estimating the autoregressive 
process. The results showed poor forecast accuracy. This appears under- 
standable since from qualitative information we know that yield variation 
.around the time trend is substantial. 


A-10 


ESTIMATION OF PRODUCTION FUNCTIONS 


Studies which estimate production functions so as to compare 
factor input are of interest in aiding our understanding of the pro- 
duction process but are of limited use in forecasting crop yxelds. 

■ SPECIFIC MODEI.£ OF DrTEREST • • 

• This section discusses the specific models found in the literature 
to have relevance to crop yield forecasting. Although most of these models 
are not meant to be used specifically as a forecasting tool they can be > 
‘adapted for this function and they provide valuable information which can be 

■ used to consti-uct such a model. The information provided in the publishe^d and 

unpublished literature is inconsistent with some models ® 

detail than others. The time and resources availablein this .study did not 
In most cases, allow us to gather data beyond the pubiishea literature. 

In general, most of the models reviewed in this study would probably 
not provide as accurate a forecast as does the USDA system for national wneat 
r-ron forecastinq. This is due to a number of factors. First, these moaels c. 
not'beon sucoeskuUy extrapolated to national totals. This Is because they are 
eHheresHn-,ated from very local data, use ve.y broad assun..ptions °”<tquire 
quite complex Information neUvorks. Second , genetic factors 
inent factors are rarely considered explicitly. .Comparisons with local USD 

forecasts %vere generally act performed. .• . , , r 

- - An accurate validation of a forecasting model should include xore- 

Iteti?tS\° teftst/dVf' thf^ of 

during the sample period. Such a description should ^ aiscussion or 

iroTvXhrjSel^prSiS^tuTnrg pj^nt ‘.““ln“vSvlV th"ese criteria . 
mreussion of validation of tlnese models is slight or nonexistent. 

Variables related to water use by plants appear to be the most 
.lonlfloant v"?aMesTn these models. These include soil moisture moisture 

iteess potential and actual evapotranspiration and f “f"; 

FurtHenno°e. the effects of these variables change with the age of the plant. 

We will now briefly discuss a few of these models which appear to 
offer some merit in deriving a forecasting model. Table 1 has been prepaie 
as a handy summary of the properties of these models; 

Weather and Canadian Prairie meat Froductipij 

This study bv G. D. V. Williams (1960) reports on the use of 
• regression ^echnlquJs to analyze wheat production. . The dependent variable 


rSIOTAI] ms B 
liF ,¥00E QUALiry 


A-11 



was y/heat yields in various regions in Canada. Explanatory variables were; 


a. Precipitation conserved in the 21-month summerflow 
period prior to May 1st of that year 

,-b. Precipitation for May, Juno and July (three variables) 
and, 

c. Estimated potential evapotranspiration for May, June 
and July (three variables) 


* • • d. Various combinations and powers of the above although 

' • these variables are listed, the actual equations used 

were not presented in the document reviewed. It is 
stated that there were a number of different equations 
estimated for different time periods from 7 to 14 years 
.between 1952 and 1967. , . . ' . . 

District crop yield estimates are then extrapolated to a total 
for the Canadian prairies according to a weighting system using acreage values. 


Using equations based on data prior to 1960, estimates of wheat 
yields were made for the period 1960 to 1967 based on precipitation ana PE 
data available before the end of July, June ana May, respectively. For this 
period the extrapolations appeared to catch turning points and direction 
quite well although tiiey did not reflect year to year differences very closely. 
Although 1961 was an unusually poor year, the estimate w'as close. This 
indicates that in practice, if weather— based estimates were being made for 
the current year, the equations could be developed from, say, the preceding 
ten years rather than an equation that was estimated for a period ending several 
years earlier. Estimates m.ade on data available at the end of June would 
pj*o) 33 bly be very close to those at the end of July. However, those petiormed 
at the end of May are less accurate. ‘ . *• 

Although national estimates appear accurate, some district or 
provincial totals were underestimated while others were overestimated thereby 
compensating each other. • 

Wheat Production In Turkey 

A study published by the U. S. Department of Agriculture in 1970 
reports on regressions of wheat yields against weather conditions during 
different parts of the gro'.ving season, mechanization and fertilizer use over 
. tiie period 1940—1968, V/eather conditions for all 12 montlis of the yeai 
were tested for significant correlation with wheat yields as was a mechanization 
variable. The best equotion v/as: • • ’ • 




A-12 



. y« 883.9 - 2.03 X g + 11.15 X ^ 13 


R = 0.82 


3.93 4.31 

SD = 104.3 


3.04 


X = January “ February aridity index for Ankara 

, ' '5 , 

Xj 2 ~ May - June aridity index for Ankara 
X = Fertilizer consumed in 1,000 metric tons 

I standard ckoriatiorris "about nine percent of 1968 yields values. 

When the equation was used to predict yields beyond 

(1948-1968), the error was less than five percent for 19o9 and i9/u. i 
eiTor for 1871 v/as not reported in the paper. However, it is cautioned that^ ^ 
since the standard deviation is nine percent, this sort of accuracy is not li.<e y^ 
to hold further into the future. The model would have to be upoated peucdica 
since the methods and patterns of wheat production in Turkey are changing 

rapidly. 


■ The Thompson Model 

L. M. Thompson (1969) estimated a number of regression _ 
equations if time trends and weather variables on wheat yields tor 
six states (North and South Dakota, Kansas, Oklahoma, Indiana, ana _ 
Illinois) . Weather variables included state averages of precipi au-on 
rainfall and temperature for various months throughout the ^ 

There has been some criticism of the use of state averages of weather 
variables since wheat is not evenly distributed throughout the stat^. 
However, there is some "tendency for favorable or unfavorable 
conditions from year to year to be fairly widespread.^ 

The six equations estimated are presented in the original review 
in Appendix 1. Coefficients of determination ranged from 0. 80 to 0.92 and 
standard errors ranged from about 9-12 percent of 1968 yield. 

.The only hint of an attempt at validation in this paper is a graphical 
comparison of the model’s estimates with those of USDA. 


The Baier Model (1973) ' 

This model incorporates several new features which take advantage 
of recent devclonments in the understanding of agrometeorological inter- 
relations. Instead of using .rainfall data , the model uses potential eyapo- 
transpiration (PE) and soil moisture (SM) as independent variaoles , In 


A- 13 



/ 


addition the concept of biological time (BT) (rate of development toward 
inaturity) is introduced. • 

It Is assumed that the yield response of a crop to these variables 
changes gradually over the season and that the daily weighting of each varia e 
can be adeouately fitted by a fourth-order polynomial as a function of bio 
meteorological time. These functions are estimated by an iterative regression 

process These estimates arc then used as explanatory variables in a 

multiplicative regression model. This technique is further explained in the 
appendix. 

, Yhe equations derived are not presented in the papei , but the 

variables used are maximum temperature, minimum temperature and soil 
moisture as functions of time . The best coefficient of determination was 
0.79. The model was not used for forecasting beyond the time period or 
lattitude in the sample. . 

• 'Although the m'etnoaoiOgy appears to show ptoential for accounting 
for daily changes in plant response to environment, the present model cannot 
be used successfully as a forecasting tool since it has not been tested, the 
data is quite dated (1953-1962) and the results have not been extrapolated 

to national totals. 

Proprietary Commercial Models ^ \ ' 

The documents consulted in this study consisted primarily of those 
that have been published through journal articles, universities and' domestic 
and foreign governmental agricultural services. However, in our various 
telephone conversations with experts in this field around the country we^ lav., 
become aware that there are a number of models in existence constructeo y 
private firms for commercial purposes. The exact structure and estimation 
techniques used are said to be proprietary and therefore these models are not 
generally available for detailed review. However, a general description o' . 
a model available thrpugh the Development Planning and Researen Associates, 
Inc., (Manhatan, Kansas) is provided here: 

The DPR^ model is claimed to have overcome many of the shoit- 
comings of the regression models discussed above by considering simultaneously 
much detailed information regarding the phenology and producLion of wnc . 

(and other croos) into a detailed structural model of the plant growth process . 
Tlus model includes all of the crop growth factors mentionea above (including 
both environmental factors such as temperature, soil moisture, solar 
radiation, soil characteristics and man made factors such as irrigation,^ ^ 
fertilizer, v/eed and insect control, time of planting, depth of planting and- 




A-14 


rate of planting) as well as genetic factors such as maturity ratings of various 

varieties of plants in various different climates . . 

The model has been used primarily for two purposes. The first 
is to advise farmers on ■f’olicy such as irrigation/ fertilizer and cropping 
patterns. The, second use for this model is in forecasting yield. DPR^ claims 
to have a much greater degree of accuracy in this use, tlian the presently 
available USDA forecasts. These forecasts are available throughout the 
season beginning shortly after planting. DPRA also states that aKhough 
present forecasts are regularly performed only on a field and regional basis 
the model can be expanded to national and worldwide levels with only a minimum 
effort. 

The model might be useful for any group wishing an additional 
‘dimension with which to check forecasts made through other means. 

CONCLUSION ON STATUS OF AGROMET MODELING 


. • Yfe have seen that yield variation is caused by many growth 

factors (environmental and genetic) and by crop abandonment factors 
(environmental and economic-). None of the yield forecasting models 
reviewed in this study included crop abandonment factors. Tne nature 
of the specific effect on yields of the growth factors are extremely complex 
In that a) their affects vary with different stages of the crop growth cycle, 
b) their effects are often lagged in complex distributions over time and c) 
they interact vath each other in complex ways many of which are undefined. 

Because of these complex factors, regression analysis, which 
has been widely used in numerous studies has been unable to capture the^ 
underlying structural relationships of yield determination. The number of 
variables that can be successfully used in a regression equation is far 
fewer than the number of variables that affect crop yield. Furtherm.ore , 
most of the previous regression models were estimated for local or state areas 
and cannot be satisfactorily exti'apoiated to national and world totals with- 
out a massive data gathering effort. . 


Variables related to water use and temperature for certain critical 
periods in the plant growth cycle are consistently the most important 
variables in the studies consulted. In recent years, new ways of measuring 
these variables (potential and actual evapotranspiration , moisuure suess, 
soli moisture budgeting and biological tim.G) have shown promise of possibly 
Improving the oredictive ability of regression equations. However, these 
models still account for only 7 0 to 90 percent of the variation in yield and 
have large standard errors of estimate. • 


> 



A- 15 



Based on those large standard errors', on the results of the few 
models tliat were examined for predictive accuracy and on the fact that these 
models are generally valid only for a specific local area, it appears that hone 
of these models can predict national crop yields as accurately as the USDA 
survey-judgmental system. This conclusion, does not preclude the use of 
some of these models as additional input to a judgmental process. 

Recent advances in models which Incorporate plant observations 
v/lth soil moisture data appear to hold some promise for accurate yield 
predictions since the entire history of both environmental and genetic 
effects is presumably contained in the current state of the plant. In some 
cases, these visual observations are rela'ted to plant density and are there- 
fore potentially observable from space. 

A realistic procedure for synoptic predictions of wheat 
yield might be the development of ground truth in selected sites 
coupled with sample survey techniques to develop region yield/aore 
estimates. This would be followed by intensive monitoring of these 
sites (remote and relayed in situ) by satellite coupled with satellite 
estimates of variations in harvested acreage resulting from crop 
abandonment factors. 

Although these models have only limited use in forecasting 

compared to the methods used by USDA, they are valuable in providing 
much infermation regarding yield-environment interactions and in that recent 
advances provide hope for increased accuracy sometime in the future. In areas 
of the World where extensive data gathering networks are nonexistent , 
agricultural forecasting models Vvhic'h rely on satellite data inputs might be 
able to improve upon present forecasts . 



"MwiNAi; PAnp » 

OP P008 


A-16 



-J7 






APPENDIX B 


’•SCOPE AND METHODS OF THE STATISTIGAL REPORTING SERVICE," 


USDA MISCELLANEOUS PUBLICATION NO. 1308 , JULY 1975 





INTRODUCTION 

Although tlic Statistical Rcporling Scrvice con- 
ducts sonic; of its surveys by virlimlly complete 
enumeration of ccrUiin parts of the population, 
most arc based on samples dnuYn from the popu- 
lation. With the use of modetMT techniques, sani- 
pling is not only less costly in time and money 
than a census, but also can produce more reliable 
rcsults.^^ 

The Service uses a great variety of sampling 
teehniques to produce current agrieultural statis- 
tics about crops, livestock, prices, and other in- 
formation iclaling to the agricultural^ 

Significant advances in mclhods used have been 
made in iccent years, parl^ witli the empha- 

sis on ]>robabi!ity sampling icGhnology, allliough 
nonprobabilliy sampling retains an imporiant 
place in the work of the SUilislical Reporting 
Service. 

This chapter provides a dcscripUo of the 
common sam pli ng procedures ( friinic con.struc- 
tioii, sample sciection, analysis, and cslimalion) 
ciMTcntly used and describes some of the research 
activities under way to improve liic quality of 
agricultural statistics.. 

THE SAMPLING FRAME AND 
SAMPLE SELECTION 

A ba.sic consideration in any sample survey is 
the sampling frame, which is an aggregate of units 
or elements from which a sample can be sclcelecL 
From data collcclctl in the santplc, inferences may 
be made about all the elements in the frame. 
These clcincnts collectively form the survey popu- 
lation, which may or may not be the same as the 


target population, which is the total universe of 
elements about which inforniation is desired. From 
SRS SLirvcys, estimates imist be made for the 
target population. 

The; type and quality of sampling franies have 
much inniicnce in determining sample design and 
overall survey methods. The frames used by SRS 
arc of two basic types— the list franre and the 
area frame. 

Ust Frame Sampling 

Sampling from list frames has for many years 
played a prominent role in the collection of data 
for agricultural slalistics. A list frame is a list of 
clcmen is presumably all from the populalion about 
which i nferenecs arc to be made, along with aj> 
propriatc identifying data. Lists of farm opera- 
lors, including names and addresses, are used 
for many of tlic surveys conducted by SRS ajid are 
well suited for the collcclioii of information by 
mail. 'Fhe low cost of data colleclion from a list 
sample is one of the principal advantages of this 
method. Anol her advantage i.s the case with which 
supplementary information for classifying the units 
can be included as part of the frame. This allows 
the use of cfllcicnt stratined sample design.s. 

The main clisadvanlage of the list frame is the 
inability to compile ‘‘complete*’ lists; that is, lists 
that represent all of tlic current units, such a.s 
farms, livcslockmcn, or processors — such units 
arc continually changing. For example, a list of 
farm operators soon becomes outdated because 
new operators enter the activity, othcr.s leave llic 
farm, some expand operations or ica.se land to 
others, nr there arc olitcr changes within the oper- 
ations themselves. 



/ 

V ciiAr riiK 2. 

Since piobabilily sampling rcciiiircs that all 
units or Ihe p()pii!ali()n be rcprcsci^ list sam- 
j pling had few applicaiions for probability surveys 
I until rclalivcly recent developments permiued 
I selection froni two or niore franies that cover the 
1 population. Applications of such nrultiplc-franic 
! sampling arc discussed later in this Ghaptcr, 

I Prior to the applicaiion of probability sampling 
by SRS during the early 1 96Crs, nonprobability 
mail surveys were the principal means of collcct- 
I ing data for current agriculliiral statistics. This 
method is still used as an inipoiaant data collcctio 
! technique for numy C(unnu)dilieSv but usually re- 
i quires supplemental survey information. 

I In using nonprobability mail samples, the short- 
j comings must be recognized. Fii'sl, lists of poien- 
ijal respondents arc not complete frames and, 
while still useful, some lists tend to be selective 
; as well. Second, there is no assurance that re- 

■ spondcnls who voluntarily complete and return a 

iiuestionnnire arc typical or representative of those 
who fail to do so. The second i imitation can be 
overcome with followup Interviews of at least a 
sample of nonrespondents. I^Iowcyciv this is 
usually not praetical, considering the limitaiions 
imposed by the frainc, aiid nullines die principal 
advantage of nonprobability mail surveys— low 
; - cost.' ■ / 

Despite the biases inherent in mail samples, 
surveys of this type with sulTieicnt response pro- 
vide consistcid indicatioiis froin survey to survey. 
Appropriate methods of estimation are used to 
remove biases froni the cstiinalcs insofar as pos- 
siblc. " ..-■■■ . 

Area Frarno Sarnpling 

In 1954, SRS began investigating the use of 
area frame sam pi ing: A program was developed 
and expanded to include the 48 conterininous 
Slates by 1967 in a system of surveys for obtaining 
information on crops, livestock, aiul other agri- 
cultural items. Today area frame sanijding is an 
integral part of the SRS estimating program. 

In area frame sampling the frame consists of 
an aggregation of idcntiliablc units of land (seg- 
ments) wluchinay be sampled. For SRS purposes, 
characteristics concerned with agricullurc must 
then be associated with these sample segments. 
Tlicre arc llirce dilTercnt concepts that are useful 
in associating agriculiural activities with the area 



SAMSUNG MinirODObOGY AND l•S^'IMA'^ON 

frame. These arc the closed segment, live open 
segment, and the weighted segment. 

The closed segnient associates the agriculture 
with the segment itself; it iiic/udcs all that is in- 
side the segment boundaries and excludes all that 
is not. In the opcMv segment, all activities of farms 
with headquarters located inside the segment 
boundaries arc associated with the segment re- 
gardless of wivether the activity itself is inside or 
oulsidc the segnvent boundaries. In tlic weighted 
segment, all agriculture associated with a farin, 
any part of which lies within the segment, is at- 
tributed to the segment in proporiicm lo tlve frac- 
tion of the farm acreage that is inside the segment 

For characteristics such as crop acreages 
which are directly associated with land, the closed 
segment has proved to be clearly superior in sam- 
pling cnicicncy. But data concerning the eco- 
nomics of the farming enterprise, for example, can 
be more easily associated wi th the fa rm hcadquar- 
ters and do not lend themscives to the closed 
segment. The wciglitcd segment is used to gain 
cnicicncy by reducing variabjlity caiiscd by ^special- 
ized and. widely diifering sizes of farms. 

A unique attribute of the area frame is that it 
is a complete sanvi^hag frame. All desired agri- 
culUiral activiliGs are represented when every unit 
of land area has been given some posilivc prob- 
ability of being selected during the sampling 
process. Furthermore, it docs not suffer the same 
kind of deterioration through time as does a list 
: frame.. ".i 

The area frame leivds itself well to enumerative 
general-purpose surveys. It is not suited to mail 
surveys, since naniGS and address of persons living 
or operating within the segment boundaries are 
genGrally not known in ncivancc. The area frame 
is not etlicien special-purpose surveys or sur- 
veys of Ivighly specialized farming activities, bc- 
cau.se the lack of supplcineniary information pre- 
cludes the segregalion of fanning cntcrpri.scs of 
a particular class. 

Two basic types of area frames are in use by 
SRS for general-purpose surveys. I'hc first i.s the 
frame developed for tlie Master Sample of Agri- 
culture, which was constructed in the early I940^s 
at Iowa Stale University witli the eoopei'ation of 
USDA and llie Bureau of the Census. The Master 
Sample wa.s tlesigned for sampling characleristics 
assxKialcd with farms. The frame consists of 


B-2 



CnAmiR 2. SAMIMJNCi Mlvm0!30IX)GY AND liSI IMATION 


counly maps vipon which ininor civil divisions 
and Iramc imils containing a spccilicd number ol 
sampling imius have been clclincatcd. hach sam- 
pling inn I cont ained about four fanns^ *SKS ex- 
perience .siiggesled ihtil seginenis half the sixe of 
those of the Master Sample were more cnicient 
for general-purpose surveys, and these units aic 
being used. Crop reporting districts arc used to 
impose geograplvic strati rietitu')!! on the Irame. 
lypically, Slates contain about nine crop report- 
ing disiricls. Within these districts the agricul- 
ture is fairly homogeneous, Allocation of sog- 
nienis to crop reporting districts is about propor- 
tional to the scjuare root of vaiue of products sold.; 

The Master Sample frame was availnble for use 
from the beginning ('>f SRS area frame sampling, 
blowcvcr, it was soon apparent from pilot work in 
the Mountain States that siralification of land 
according to use was cssentiah GonscqiieiTliy,^^d^ 
second type of area frtunc used by SRS is the land 
use frame, in which all land prior to sampling is 
first classified according to use. The stralification 
is based on extent and type of farming and can 
be described in four broad Ct\tegorics: ( I ) In- 
tensively cuitivated areas where a significant poi^ 
lion of the land is under cuitivalitni, (2) extensive 
agricultural areas used priniarily lor gracing and 
producing livestock, f 3 ) higldy developed land 
found in cities and industrial areas, aiid f4) non- 
agricultural laiul, such as parks^^^ a other recre- 
ational areas. In additieVn to laiul use stratifica- 
tion, gcognq'>lnc strati fie tit ion is frcc|ijcntly used 
to separate dinbring agrictilltiral arca.s. 

Segments arc of a piTclcterininec] si/.c, witli seg- 
incnt coiints associated with each area delineated 
on inaps aGcording to si/e of area. Seginenis 
typically arc about I; square niilo in intensively 
cultivated areas, several squa re miles and la rger 
in the more open farming areas, arul about onc^ 
tenth square mile in city and residential areas. 
Tlie number of segments sampled from each 
stratum is determined by reviewing o|nin^ 
location s b n* m a j o r c o i n 1 1 n ) c 1 it i e s li i u! c h t>osi n g 
a compromise for generabpurposc sampling. 

Land use frames arc curreiitly being developed 
State by Stale as needs indicate and as time anti 
resources permit. States still using the Master 
Sample frame are ii\ the north central, south 
ccitlral, and south Atlantic regions, where diller- 
cnccs of land use practices arc less apparent, 


Segment selection has generally followed a 
systematic-sample approach where the frame list- 
ing is arrayed geographically. Recently, inlcrpenc- 
t rating sample designs have been used. Inter- 
penetrating designs utilize several smaller inde- 
pendcnl samples, and have more sample ncxibiliiy 
atui advantages in computing sample varluliom 
They also til well with a sample rotation scheme, 
dwpically, 20 percent of the SRS segments arc 
rotated annually to relieve respondent burden. 

All vSelcclcd segmenLs arc visited uivnually about 
June 1 for the June eiuimcrativc survey to ascer- 
tain planted crop acreages and inventories of hogs 
and caltie, and to classify oporations for purposes 
of subsampling for subsctiucnt surveys. All sepa- 
rate lajicb operating arrangements arc delineated 
willviii the seginents and are referred to as '‘tracts.” 
To control sampling errors, the area sample !s 
supplemented with a small list frame sample of 
known large livestock operations, this being a 
limited form of imilliplc-frnmc sampling. 

Sampling for several subsequent area frame 
surveys uses The June information for classifying 
tracts. The classifications made arc utilized as 
strata for second-stage sanrpling. Tracts arc then 
subsampied from each slratum at varying rales, 
according to their information polcntiab The 
December enumcrative survey is the largest sur- 
vey of this type and focuses on fall-seeded crops 
and livestock inventories. A large portion of the 
tracts with wheat and livestock in June arc se- 
Icclod. Nonagricullura! tracts arc sampled very 
lightly. 

Multiple-Frame S^mp 

Amethod rapidly gaining iinportance and use 
III SRS stirveys is nuilliplc-franic sanip 
the naine iiirplie.s, this lcchnic[uo includes the use 
(if inore thaii oiic sain]iliii^ For SRS needs, 

this nieniis a list franie anci aii area frame. 

Theory for nriilliplc-franie samiriing was dc- 
ve I oped on ly as rece lU ly as* the early 1 96(Ts. Tic- 
search uiulcr the leadersliip of Dr. II. O. Harlley’ 
was supported by SRS al Iowa State University. 
Concepi.s t)f nruIliple-Trame saiiipling arc basically 
those of probability sampling concerning repre- 


, ■■ Di . Mat (ivy is cuiTcntly Director, Institute of Sta- 
tistics, q‘e.<;is AXrM University. 


ORIGINAL PAGEiS 
OP POOR QUAUra 


B-3 



Cl I AI'1‘i:k 2 . SAM I'LING Ni in i lODO LOGY AND I2S FIM ATION 


SciilalioiL kiunvir |m)lxihililios, and r;auioiuno:;}> 
j of selection. In acklilicni, two critcMia need to be 

I considered: (!) Hvory olemenl of the po|Hilalion 

i must belong to at least one of (he sampling 

< frameSv and (2) it must be pussibic to idcnliry 

for cacli selected unit lo which frames, if any, it 
belongs oilier than the one from wliich it was 
selected, The use of a complele area frame 
satisnes the lirsl consideration. I he second is more 
dinicuil operationally, retjniring the proper classi- 
hcalion of each tract operaior as to whether lie 
is also included in the list frame. 

Mulliplc-framc sanipling has some distinct ad- 
vantages for SRSv ixuaicLilaily lor iteins such as 
livestock, speciali/cd crops, and economic data. 
Tliesc items are poorly correlated with land alone 
and arc incnicienily esMinated by the area frame, 
hr inuiliple-framc sanipling, most of the data for 
the population of interest can be collected more 
cnicicnlly ihrough the list frame. Some of the data 
can be collecied by niaii. Also, it is usually pos- 
sible to develop and incorporate in the list frame 
some index of size fur units (hat is used in siratin- 
calion . The a rea fra me n i easures list incomplete- 
ness. In this way, the two frames complement 
each otlicr. 

The Stale Statistical Onices have principal re- 
' sponsibilltiCvS for dcv\f oping list saniplittg franies 
of farmers and ranchers for multiple-frame siir- 
ycys. A variety of list sources is used, including 
State farm census^ assessor’s lectrrds, Agricul- 
tural Slnbiltzatlon and Conservation Service 
(ASGS) Ilxts, brand listSv and lists maintained 
by State govcrnniciits for i nspection or eonlrol 
^ More specialized lists arc often eoni- 

hined with a basic list to inipian lisvl C(Wcragc. 
Lists vary greatly in quality and usofulncss ami 
iTquircr CO iisidcrable c!T(>rl to prepare before use. 
in sampling. 

Often the list has to To converted into com- 
puter-readable form. Units which arc duplicated 
must be removed and liic indexes of size of opora^ 
(ion may have to be t)btained from other sources,. 
Special large mail surveys arc sometimes con- 
dueled for the sole purpose of classifying farms 
by type and size. Counly and local ollicials (T 
ASCS, the i^xicnsiun Service, and other USDA 
ageneies have pr(>vided vahial>le assistance in list 
dcvelojimeiU ciTorls. . . 

After initial list develoi'»menl, maintenance and 


updating are continual tasks. Without sucii cL 
foris.Jisis detciiorate rapidly and soon lose their 
advantage in sampling eniciency. 

ESTIMATION METHODS 

After a survey is tlcsigned, tlic sample selected, 
and daUi collected, the data must be edited for 
consistency and then sunimarizcd. From these 
survey results the statistician musl prepare the 
cstinralcs, 'llic conipuiaiit)ns and procedures for 
iranslatiiTg survey data into cstinialcs involve tech- 
nical consideraiions. Usually iiiure than one 
method is available, but Ilic choices arc largely 
spocilied by survey design and ilTgre arc distinct 
dilference.s belwcon deriving estimates from non- 
jtrobabiiiiy sin'voys and from surveys which fol- 
low the concepts of ju’obability theary, 

Nonprobability Surveys 

In developing current Gstimates from nonprob- 
ability mail mrveys, estimating procedures must 
recognize potential biases in the survey results. 
The procedures used gencrully clcpend on past 
relationships of survey data to final estimates. It 
is assumed tliat these same rciatlomOiips are con- 
linuing, but periodic checks must be made to 
verily this assumption and to true up the cstiniaLes. 
Ghcck data arc obtained from a variety of sources, 
but gonorally arc in the form of records of mar- 
ketings or census cnumeralicvns. Information from 
the IJ.S. census of agriculun“c anti froni anno 
farnr ceiisuscs coiKluctcd in some Slates has com- 
monly been used foivlhis purpose. 

Many factors a iTccl the reliability of csiimates 
derived from ntniprobabiliiy surveys. First, it Is 
necessary to eyalutitc ih accuracy of the check 
tlala liscd ity csttiblislilriic values. Fn^ors in these 
data will result in errors in the rcjaiionshli')S de- 
rived for pas! years. There is ahvays die pcissibiliiy 
of error in assuming dial past rclaficrnships of 
sur\'cy data to final esdinatcs will continue. Com- 
parabiiily ch’ survey dnUi must be inaintained for 
the period in w'jrlch relation.sliips are deilved. Jf 
survey indications for past surveys are based on 
selective data, itulienlicmK used to make the cur- 
rent estiinate iuust be sttbject to the same kind 
of selcelivily for best results. 2licrcforc, coiVsIdcr- 
alioit of coinptirtilvillty shoulc! Ix’ given to the list 
samples, the sampling pi^Kcdurc and distribution, 
atid the siirvey response. 


OEiGTHAIi PA.GBB 
OF PGOE QUALTT^ 


B-4 



CIIAI'TBR 2, SAMIM.INU MUrilODOLOGY AND liSriMATlON 


Survey indications 

Dircct-cxpanKion indications arc not possible 
with nonprohahilily surveys because of the in- 
ability to associate known probabilllies witii the 
data cxMIecied. Therefore, most survL'y indications 
arc relationships estimated from the siirvey data 
whiciv can be applied to some assumed known 
base. A brief description of someof tlic commonly 
used nonprobability survey indications follow. 

Kadi} to lafid: Relations of an item to total 
land in farms can be estimated from survey data. 
Used primarily for crops, the sample total acre- 
age for a sjiecified croj^ is divided by the sample 
total farmland acreage. This provides a m^iasure 
of the proportion of farniland acrctigcs used for 
individual crops, 'rhe relalions of any two items 
on the ciucstionnairc can be cstiinalcd in this 
manner. 

Tins indication similar to the 
above but the control variable, such as capacity 
of focdlots or grain storages, is known in advance 
and is part of the sanipling franie/^ csii- 

imucd from the sample totals can be expanded by 
the known base totuls for the populuiitni. 

Aycrciqe per Icirm: Averages per farm estimated 
froni survey data are used to estimate livestock. 
These averages can be associated with estimates 
of farm n limbers. Averages obtained fronv mail 
surveys can be quite biased because of widely 
varied farm sizes, whicli inay not be proj^erly rep- 
resented among survey respondents. 

Mutched re/;f;rAv; Hstimates of survey-ley-survey 
changes can be made by maldniig “identical farm” 
reports from two successive surveys. This indi- 
cation lias comnioirly lu'cn called the “current/ 
current’ ratio, Indicalioiis arc developed by ap- 
jdying survey clitinges to the previous esliun^ 

Ga re must be taken in the matebing process to 
assure that the rciKMilng units are comi'iarabic 
between surveys, -rhe mrncedur permit 

new operating units to be included in the tabu- 
lations. 

A variation of this procedure is the “current/ 
historical" indication, which also nielisure.s change 
from sonic previous perind/but data for tiie prior 
period is ayjlccted on tlie cu rrent c|uc.stionnai re.. 
For example, a fariner vvcuild be asketl to report 
Iris previous yearns acreage of ettch crop along 
with current year’s acreage, The advantage is that 
all reports can be used for tabulation and no 


matching is required, but it has been found that 
the data reported by farincrs for the preceding 
year arc often subject to error because of memory 
bias or other reasons. 

Yield indicationx: Alail suLwcys have retained 
niucvh of their uscfuhiess for c.stimating and fore- 
casting crop yields. Perhaps one reason is lliat 
yields do not vary greatly by size of' farm. At 
harvest, actual yields can be dcriveil by obtaining 
harvested acreage and coniparnble production 
data. Indications for forecasting yields arc based 
on reports of condition or probable yield. Re- 
ported condition consists of evaluations by grow- 
ers and crop reporters of the size of the current 
crop expressed as a pcrccmagc of a hypothetical 
full or normal crop. Expccicd or probable yield 
is likewise a subjective judgment of crop prospects, 
but is expressed directly as yield per acre. 

Data tnicrprclaiion 

/Hig assumptions that must be made to prepare 
estimates from nonprobabillly survey indications 
arc factors that limit survey rcliabili^^^^ Several 
methods iirosl frequently usee! for minimizing or 
inlerpreiing llie inhcrcni biases should be men- 
tioned. 

Wei^dd('d averages: A procedure for minimizing 
response biases is to use geographic or size group 
stralifjcaLion in simimarizing the data. Known or 
estimated weights arc used to wciglit stratum 
averages up to Slate estimates. Tlie clTcct of a 
poor distribution in sample response is miniinized, 
pro viding resjiondcnts have characLcristics similar 
to otlicrs in the same slralum, For example, crop 
yields would normally be expected to be more 
alike within a crop reporting clistrict than witliin 
an entire State. Average yields from the survey 
are comiiutccf at the level of the crop reporting 
distriel and weighted to a State average yield, using 
district esilmalcs of crop acreages for wcl^^^ 
Size group slratifica^^^^ used similarly. 

Chans: Most iuvn]:)rc)bability survey data tire 
interpreted in some \vny llvrougli charts whicit 
pictorialiy describe past relations of survey data 
V to Hnal estimatest^ The most common of these is 
the simple rcgrc.s.sian clKirl, wlicrc tlie relations 
are plotied, using the lurrizontal axis for kx:aiing 
the magniliidc of pa.st survey indications^^^^a the 
verbcai axis for corresponding csUniates, Th-c 
statistician prepares the estimate by determining 
llie best-lit location on the grapli corresponding 



r — Marvwted Y' 


CIIAIMIR 2. SAMN.ING Ml- 1 1 lODOlXKiY AND ISIIMATION 



t^uic I. — txamplc of a i cprc'^Nion ch.irt used to csti- 
Kite a Slate’s Nsinicr uhcal yield. 



I ipure 2. Txample of a lime series eliart used in csti* 
m.itin^ a St .lie’s sliKks ol wheat on farms. 


lo the current survey iiulication. The graph inter- 
pretation is frequently done visually, although the 
linear regression line is usually c('»mpulcd and 
plotted to assist interpretation. I’oints on the 
graph arc itlentifieil by year so that recent year 
relations can be given more influence if desired. 

1 ime-series charts are used for some commodi- 
ties. I he horizontal axis is used for the sequential 
plotting i>f time, and the levels of indications and 
estimates are indicated on the vertical axis. Indi- 
cations and the corresponding estimates are dis- 
tinguished by diflcrcnt types of lines drawn lo 
show respective year-to-year changes. Current es- 
timates arc set with the available knowledge of 
these past relations between the level of estimates 
and survey indications. 

Trend is an important consideration for some 
estimates, particularly in developing crop yield 
forecasts. A time-series chart in addition to a 
regression chart is sometimes used for this pur- 
pose. The regression chart is used to present the 
usual survey-estimate relationship. Deviations 
from the regression line arc plotted on the lime- 
series chart. These deviations plotted sequentially 
illustrate the elTect of time and allow' a projcclipa 
to be made. Another method uses time as a sec- 
ond variable for developing a multiple-rcgresrion 
indication. In this way an allowance for trend is 
incorporated into the indication. Additional vari- 
ables, such as precipitation, arc occasionally used 
in developing the multiple-regression indication. 

Probability Surveys 

Estimates can be made from probability sur- 
veys without dependence on prior survey re- 
lations or benchmark data. With knowm prob- 
abilities. raw data arc expanded into unbiased 
estimates of current agricultural activities. Also, 
sampling errors are computed that provide the 
statistician with a tool for evaluating the reliability 
of estimates generated. Sampling errors not only 
provide measures of precision, but the sources of 
sample variation are useful in optimizing sample 
designs and allocations. The quality of statistics 
derived from probability survey data usually justi- 
fies their higher costs. 

Ilasie considerations for survey reliability arc 
sampling frame, survey ilesign, and sample size, 
l ach is important in maintaining sampling errors 
at acceptable levels, although constraints on sam- 


B-G 



CUAPTIiR 1 SAMPUNCj lODOLOC V .and liS 

pic 5ii'/c arc frcqucnlly imposed by budget limita- 
tions. Measures of nonsainpling errors are rarely 
available. Much clVort is made to minimize puleiv 
tia( nonsampling errors through survey training 
prograins; tjuesti()iinairc design and testing, pro- 
viding precise survey procedures, and utilizing 
comprehensive editing systems. 

CnumeratfVe survey 

In SRS /'cnuinerative survey" refers to area 
frame sample surveys in wlilcji data arc collected 
by personal Inicrview.: '1 he basic cstinnitor used 
for area fnmie survey data is the unbiase direct 
expansion. Raw survey data from each segment 
arc expanded by the rcci|)roca[ of the probability 
of scleelion. Hstimaies are generally coinpuicd at 
the stratum level for analysis purposes, but in- 
ferences from enumcrative survey data arc seldom 
made below the Stale level, because cdViulalively 
large sampling errors. Segments arc the jn'inrary 
sampling unils, lienee tract dal a must be siimmcd 
to tlic segment level. Sampling errors are tlien 
clGtcrniined fnym the variation between sogineius. 

Ratios and ratio estimators arc also utilized 
with data froin area tame 1‘hese cs- 

tim ates . are jrariiculaiiy hel pful in eval iiallng 
changes from survey to survey. For llic June 
enumcrat ivc su rvey, ratios arc compuled by com- 
paring Gurrent survey data with previous-year 
data for idcnlica) segments/ Ratios arc computed 
at each Icvet of snimnary, hence bia^ Inherent 
i n r a t i o cs l i mat cs ; ire i n i n i m izc cl . In ex pa n c I i n g 
previous unci current mateiTcd ciata, consicleralion 
isrgiveir to the rraetiev (yf ttnal .saiiiiiding imiis that 
arc comparable, if HO percent of the segments 
i j r a s t rat ii i n n re id e n l i ca 1 ( !*ci 1 low i 1 1 g a; 2t) -p e I'ce 1 1 1 
anntnil; rotation sehemc). all expanded matched 
data would be divided hy m addiiional faoior of 
O.H. /this allows for variations in the roialion 
scheme. The ratio estinniie is tieri\\'d l>v a 
l lie ratio to i lie ciirect ex jvaiision esi iinatc IVc^ 
previous yea r ;s survey. I istimaied sam pling errors 
take into aceount tire correlation or covariance 
of tlic matched data. 

A thiref cstiinataiv is dcrivec! fronr a (o 
land area. This csiinuilor is cnicieiU for major 
crops and other items lluu are highly ccirrclatcd 
witli land area, d'ho actual area of eaclv segment 
is measured frotn a scaled aeriarphoiugraph. The 


OF POOR 


riNfATlON 

relation of each item to the incasured area is 
calculated aiul this ratio Js appILed to the total base 
land area at the Stale icN’cl. All conccjits of ratios 
and ratio estimates apply; however, the base or 
total land area is assumed to be knowji without 
error. 

Somewhat more clinicult are the .theoretical 
concepts associated with subsec]ucnt area frame 
surveys in whieli all June tracts arc first classified 
into strata and then siibsampled. Although it is 
a two-stage samide design, the second stage of 
sampling is not confined lo primary sampling 
units, as il is in cluster sampling. Instead, the 
xecond stage of selection is among all tracts classi- 
fieti aceoixling to predetermined criteria using the 
June information. With this sampling scheme it 
is (jLiite likely that some scgmenls will have no 
tracts .selected in the sample. Uivbia.scd dircct- 
exj'iansion estimates can still be generated by 
associating the probabilities of selcetion (prob- 
abiliiies at the first stage of selection multiplied 
iyy probabilities at the second stage) witli t lie data 
for each tixici sainpled. The difneuity arises in 
conijHiting sampling errors. The concept assumes 
a product estimator where the factors arc a popu- 
lation estimate for total number of tracts within 
eaclv classincation and an estimated average tract 
value for tracts witliin each classification/ The 
variance com pon cm associated with estimating the 
liunibcr of Iraets is computed from the June 
enu m era Live sii rvey, wliercas the com ponen t for 
helwecn-lract varialion must come from current 
survey data. 

Ratio cstiniators are also used for surveys 
based on stibsaniplcs/ of Jinic area tracts. The 
nit (CVS aiu compu t cd by relating cu rrciU data to 
June data. The J unc cniiivierat ivc survey clircet- 
expansion estimate becomes the base for com- 
puting a ratio eslimaie. These cstinTaies arc par 
lieuiaiiy useriil for the July acreage update suiwey 
where correlations are very Ivigh between actual 
planted acreages and those reported during the 
June cinimcrativc survey (which in some cases 
are inlcnded plantings). 

Mu/t/p/e-frame survey 

The general estimation mode! for muitiplc- 
frnme surveys bti.sed on a list and area sampling 
frame i.s: 


-7 



CIIAPILU 2. SAMrUNCi Ml: I IIOOOUKiY AND i:sriMATION 


X X4 pX.,i I i|Xtti 
uIkmcX, =r ihc cslimalcil lolal for ilic por- 
lion of ihc populalion includcJ 
only in ihc area frame; 

X^i - Ihe eslimaled lolal for ihe popu- 
lation included in both Irames, 
Ci>mpnleil horn tlie area sample; 

X',, r-- llie estimated lolal for the popu- 
lation included in both liaincs, 
computed from the list sample; 
and p i q = 1 

Since X., and X';,i are two iiulcpcndcnl esti- 
mates of the same population (ovcilaji domain), 
any N.ilues lor the weights p and q which sum 
\i\ 1 will provide unbiased esiimaies. Optimum 
weights will be inversely proportional to the vaii- 
anees associated with each estimate. In practice, 
weiglits arc predetermined, iitili/ing information 
Irom prior surveys. Ihc value of ij is usually 
laree and is associated with the greater ellieicney 
td the list frame. Pe>r li\estoek surveys, values 
oi p 0 and q - I are used, d his equation is 
often referred to as a “screening” estimator. In 
the variance computation, X., and X.,i are con- 
sidered nonindependent components of the esti- 
mating eijuation. 

Little Use has been maile of ratios and ratio 
estimates in multiple-frame sampling, nirect- 
evpansion estimates have proven ti> be ellicient 
aiu! alk>w amiplele Hexibility in developing the 
sampling plan for each survey. 

Objective yield survey 

Objective yield suive‘s provide en>p yield in- 
foimatii'n for estimates i>r ttneeasls based directly 
on CiHints, ineasui ements. aiul weights of the eii>j) 
maile from small phns in a probability selection 
of s,imple lields. When a crop is mature and 
ready for harvest, yield can be estimated by har- 
vesting and weighing prt>duetion from these plots 
('f kiKwvn si/c and e.xpanding \o a yieUI per acre. 
I his method of preharvesl sampling to estimate 
yicKK is often referied to as “crop cutting.” Sim- 
il.ir proecilures are used for tree crops, but yield 
is computed in terms of pnnluetion per tree and 
observations are usually maile on sam|>led limbs. 
1‘or a mature crop, estimating yield becomes pri- 
m.nilv a sampling problem. I heorctieally, sam- 
ples can be designed to produce estimates of yield 


with any desireil degree of precision. 

1 he same sampling considerations are impor- 
tant for objective surveys used in forecasting 
yields. In addition, carly-scason plant character- 
istics must be identified which can be used to 
predict yield at maturity. A forecast model (often 
a regression equation ) has to be ilcveloped 
that di .ribes the relations between the predic- 
tion variables and the final outcomes. Lor all 
crojvs, it is usually hclj'iful to analyze yield in terms 
of two components: Nuntber of fruits and weight 
per fruit. Reliable forecasts of numlK'r of ma- 
ture fruits are readily possible, since most plants 
set fruit at a fairly early stage of maturity. Identi- 
fying useful plant characteristics ami predicting 
weight per fiuit is more dillicull, since growth of 
the fruit typically continues until maturity. 

An additional factor of yield which must be 
taken into account for SRS estimates is harvest- 
ing k'ss. Riokvgieal (gross) yields can be esti- 
mated from preharvesl objective sampling but 
these estimates overstate the production that is 
actually hauled from fields and can enter market- 
ing channels. To estimate net yield, special post- 
haivest sutveys are conducted to measure all 
production remaining in lields after harvest. 1 hese 
losses, which are measured by gleaning small 
sample pkUs immediately following harvest, must 
be subtracted from gross yield. 

Field crops: 

Conee|>ls and general methodology used in 
objective surveys for forecasting and estimating 
yields are similar fm all fiekl crops. Sample fields 
are selected from fields identified during the June 
enumerative survey as having the crop of interest. 
A systematic sampling scheme is used for selec- 
tion, kdlowing a geographical arrangement of 
fields. Self-weighting samples are achieved by as- 
signing probabilities of selection which are pro- 
pivrtional to expanded field acreages. 'I his facili- 
tates summai i/alion and has proven to be ellicient 
for estimating purposes. Ob.servalions are made 
on two randomly selected plots (units) in each 
of the selected fields. 

Objective yield surveys are planned to coincide 
with Ihe publication of production forecasts and 
estimates in the monthly ('lop Pn>duction repmt. 
niiring the first survey month, crop maturity will 
v*uy considerably by area of the country. Appro- 


B-8 



CHArTIiR 2. SAMPLING NUrniODOLOGY AND LSTIMATION 


prialc coiiiUSv uicasurcmcivts, and ()lhcr observa- 
tions arc made for each .sample llial will be used 
in the forecast models. Plant characlerislics u.scd 
as prcdlclion variMblcs clninge as inaturity pro- 
gresses. At an early stage, for example, a coiint 
of planls may be the only dtita available, but it 
is valuable in forecasting llie number of nvature 
fruits. If no characlerislics arc available to predict 
weight per fruit, historical averages will be used 
for the sample. As tlic crop matures, other vari- 
ables become important. Actual fruit counts arc 
used, and weights and measuremenls of the im- 
mature fruits arc often use fill in predicting final 
wciglit per fruit. Simple iinenr- and multiple- 
regression niexlcls arc most often used to describe 
past relations between the prediction variables 
and the fiiuil observations at maturity. Typicallyy 
relations observed over the preceding 3-3'car pc- 
riocl are used in cuiTcnt forecast equations; Fore- 
casts of gross production are computed for each 
sample. Plots for most crops inchule two adjacent 
rows of predetermined length. Measurements arc 
made to determine row spacing so that conver- 
sions can easily be made to yield per acre. An 
adjuslmciil is made for expected harvesting losscsj 
based on past averages, ladlvidual sample yields 
arc averaged to arrive at Stale cstiniates. SanTpling 
errors arc based on varialion between sample 

yields./^;- V --V- 

As the season lu'ogresscs and crops mature, the 
individual sample yields provide d ata for esti- 
mates rather than forecasts. Hinai preharvest ob- 
servations arc inade as near harvest as jiraGlicablc. 
Simihvrly, for best results it is desirable to do the 
poslharvcst work ininicdintcly foljowing famicr 
harvesL When the inforinad^^ 
h arvest i ng Josses are usee! in coin put i ng net yields; 

Tree crops: 

Stunpling frmiics used for 
( rield.s) of trees Ivuvc been ilevcioped by various 
means. In some cascsr ncarlyve lists of 

grow ors, classified by size of operntidn, have been 
made availah through trade or marketing asso- 
ciatioiis. Area frames liavc been constructed by 
identifying blocks of trees on tierial pltotogi uphs. 
Stratification according to age of tree reduces 
sampling variability in some applications. In 
addition to its uses in sampling, the frame usually 
becomes the bttsis for estimating the population 
of trees. 


Blocks of trees arc sampled with probabililics 
proportional to the number of trees or acres, 
which results in a self-weighting sample. Counts 
are u.suaMy made on two to four trees per block. 

A random method is used for sciccling a “pivodV 
tree with additional count trees .selected nearby. 
This cluster reduces counting lime .within the 
block. The random-palh method is commonly 
used for selecting count limbs on a tree, Begin- 
ning at the base and ptmeeding up the tree, a 
random sciection ismadc at each point of branch- 
ing until a count limb of suitable size is obtained. 
Probabilities proportional to the cross-seclional 
areas of the limbs arc usually used in the selection 
process to gaiii sainpllng efllcicncy. An alternative 
to the random-path method is to select a primary 
limb as described, but map out the remaining 
branches into suitable count limb sections. A ran- 
dom choice of one or more of these sections can 
then be u.scd for counting purposes. On maiure 
trees, 5 to I G pc roe n of the tree js usually 
counted, TTe probabin tics associated with each 
stage of sclcciion iivust be used in expanding the 
limb counts to an estimate of fruit per tree. 

Once fruit is set, forecasting becomes the task 
of projecting drop aiUl growth. Most droppage 
occurs immedialcly follovving bloom, after which 
the fruit counts become relatively stable. Predict- , 
ing weight of nvature fruit is done by relating im- 
maiure size dr weights to final weights. Drop and 
growth patterns Gbserved in past surveys are a 
rccjuircmcnt for the current forecasts. 

Perioclic surveys are used to upiintc the projec- 
tions of fiaiitdr()p and gm^^^ until harvest. An 
allowance must be made for fruit remaining after 
harvest, pariiciilarly if mechanical harvesting 
equipment is to be used. Since blt>cks arc the pri- 
mary sampling units, sampling errors of estimated 
production per tree arc conrputed from variation 
between blocks. 

PREPARATION OF ESTIMATES 

rorccasis and estimates represent the combined 
elTorl of both the Slate Statistical Offices (SSO's) 
and the Washington, D.C., ofilees. Most sample 
data aix collected, edited, siimnuiri/A;d, and ana- 
lyzoil in the SSO’s. State statisticians prepare the 
initial idrccasts or estimates for their Slates and 
transmit them with supporting data and comments 


B-9 



CIIAVri K 2. SAMIMJNCi M m I !OI>OIXXi V AND liSriMATION 


lo the Crop Reporting Board in Washington ioi 
review. An cxplanaiicnv of unusual local conditions 
or other pcrliiK’nt inloniVLition ancciiiig an csii- 
nuite is given in the staiistician’s coinnvents. 

in Wasliingtiai, llio State data arc suninuiri/ed 
naiionaily lor each item. Hsiimates recommended 
by the Slate slalislieians are reviewed by com- 
modity specialists of lire Crop Reporting jioard. 
The reviewers liave all the srir\*cy inh.nanatit)n th^:^ 
was avtiilablc to statisticians in the Stales aiul can 
evaluate llie data at the national and regional 
levels. For many commodities. State survey indi- 
cations are sirmmed lor the U.S. level and a na- 
lionaj esiiniuie is set iiiM, d'hese prtKedures per- 
mit the use erf check data and other survey 
inlbrnraiitrn tivailable at lire national level. For 
example, sonic oT the pitibalulity siirvcy data ttre 
extremely valuable at lire iiaiional and regional 
levels, bill arc inore limited in \ a1ue lor State esii- 
males because of relatively large sanipling errors, 

Imr all major coinmodities, including livesio^^^^^ 
species and crops rdoiniliecl as speculatiyc, ineni- 
hers of a fornial Crop ileporling Board convene 
to review and adopt the ollicial csiinruics, Faeli 
nrember makes air iudependont interpret alic)n of 
all available data tmd reconrnrends an cslinratc. 
'riio Ghairniitn of the Ikrard rev iews Uresc reconr- 
nicndations and reconciles ilincrenccs ol opinion. 

RESEARCH 

SRS Gontiirually coiiducts respareh^^^^^a . 

improving the qualily of its services to the publie. 
'riic pri nci iTal a rcas of s tudy a re brieny described 

hclow.-:;-::-:;- ■■ 

Sampling-Frame ConslrucUon and 
MainlenanGe 

ITrough the past several years research and 
operational experienee liave resulted in an evolu- 
tion irf area fiaurie eoiistruction.. NUrsi Stales rrow 
have a land area sanrpli rig rrame based on strati- 
llcatiirn of land according to ngrieiiltural use. A 
rcceiu ad \ e n t lt> 1 he ba sic design has been t lie use 
of inier[xaretraling s:invpling to selcel units froJU 
live fiaurre. Within a lairtl use^^ a set of inde- 

pendeiu samples are selected, using a random 
method. Interpenetrating sampling facililales an 
orderly rotation plan, of sainpling lurils for ciui- 
lueraiion. Other advanlages arc that a replication 


can be used as an indcpendenl esiimaling sample 
Tor special purj)oscs. and the land use siratuni 
variance may be computed c)Uito easily by using 
(he replicated meims or totals, 

Research in land area sanvpling-framc conslruc- 
|i(m centers on line tuning, or iniroducing grcaler 
cnicicncy in the melluultdogy, Currcnl investiga- 
tions cover opiinuim stratilicalion and segment 
si/e ; ways to improve accuracy and quality-controi 
measures; and exploration of new frame mate- 
rials, such as liigh-alliliidc or .satellite photo- 
graphs. Since the land area sampling frame is the 
o i ) 1 y c t V nVj'> 1 c 1 c s a n 1 p I i 1 1 g f r a n i c , S R S ni us t jn a i n I a i n 
aiul ijuprove the ellleiciicy of its use, even though 
SRS relics heavily on the snpliisiicalcd application 
of list nies as a partially complete fratne for esU- 
malion. 

A second area of research is in developing 
nanie list ivies suitable |‘or use i n inultiplc-f ranie 
sampling, A major problem asstmaicd with con- 
sliHiciing such a lilc is identifying tluplication of 
names within the llle, liic process of identifying 
duplicuiion using computer technology is called 
“record linkage/^ Spceifieally. record linkage 
brings together two or more separately recorded 
pieces of information concejaung llie nuiiie of a 
parlicular individual or operation. Tasks within 
the overall heading of reeorc] linkage include data 
manipulation ( the process by which unlike records 
are restriiclured to make them mora compurable 
wilhoiit eiianging the basic information) and 
nvation coding ( the process of removing variations 
of alpha or inimeric ififormaliQivby siibsliluting a 
/ common code system ). By perforniing these two 
steps, the siniilarity of rcco!*ds lias been increased 
without changing their ill fornialion contcnb Once 
ihese t\vo prcicesses are it niust be 

decided if individual records arc linked with other 
records, i^ri'jluibiliiies are tised by a mode! to 
create the likelihood of link or nonfink; and a 
hypothesis lest is used in licciding if tvvo records 
arc indeed tlic same. Finally, a nuUhod is devel- 
oped by which inforfiTation gained about jianic 
records iuny be retained so that the proecss of 
identifying iinique list name sanipling units im- 
proves over time through survey use. 

Nonsampling Error 

Rcscarcli on nonsampliiig errors is directed at 




B-IO 


CHAVTilH 2, SAM PIJNC; Mli' H |()I)()I.OCi Y ANI^ liS I IMA riON 


the survey as an instriimenl to nieasurc ccrlain 
items of interest, such as crop acreage or mimbers 
of livestock. 

Nonsampling errors arc to be disiinguishcd 
from tlie srnivpling crri)r, wliieli arises from the use 
of a sample rather than tlic entire universe of 
clcmcnls to be studied. All other types error 
are called '‘uonsampting erreu'sd’ a term often 
loosely considered as syiu)nynunis with ■'response 
errors" and '‘measurcmenl errors." Nonsampling 
errors are not necessarily related lo the sixe of the 
sample, as are sampling errors. 'I^iey may arise 
from eiTirrs of meiisurement, since iiny iiicusuring 
insirumeiu will vary in its ttbility to measure pre- 
cisely the item of int-eresi. A. survey is subject to 
jnany stmrccs of Jionsampling errors ; 'The frame 
may be unsatisfactory, sample selection inay be 
biased, questionnaire design may be deneient, 
improper inrormaticui mtiy be recorded, mistakes 
may be made in processing the data, tind dtua may 
be missing hccauke of lack cd response, etc. 

Unlike sanip errors, nonsampiing errors 
prcscnl considerable (liflicuily in tlie estimation of 
the variability thui inuy be associated with tlienv. 

It niay be possible to nieasurc sonic particulaiv 
component of suclr errors, hut there inay still exist 
some unknown eoniponenls. As a result, there has 
been liUlc practical work done in the area of esti- 
mating tionsampling errors. More progress has 
been made In ideiUify ing sources of nonsampling 
errorsr 

Identifying the sources of nonsampling errors 
is the (irsl step in cievcloplng proeecl tires to^^^ r^ , 
move them. Antilysis of survey data and compxiri- 
son o r resu Its ol’ i nd e loencie lU s u r\'cys i ire asu ring 
the stnne items niay inciieate sourecs^^ 
pi ing c rrors. Somell n ves sue 1 v an a lyses oi* compa rl- 
sons indicate that, nonsampling errors are present, 
bill do not identify tlic sources. If this occurs, an 
alternative is to reiiUerview by ;m independent 
method tlKit is considered l() be inorc accur 
Tliis can be done vviMi a siihsamplc of survey rc- 
s po n d en ts . 1 1 i s asst \ i n c d 1 1 1 ; U 1 1 1 e . re i i itcrv ic w ing 
team is a more accurate measuring tnstrumenl, 
because better interviewers arc used and the quos- 
tiomuiire is structured in gixnilcr tletiii! to reveal 
tile correct values if tliey are not obtainable by a . 
direct (jiieslion. 

After sources of luitisaiiipllng errons^^a 
lied, it is necessary lo develop procedures to 


ivicasurc the degree to which they alTcct the items 
of interest. One procedure is to use replicated 
saiiipling to build into a survey an experimental 
comparison of several dilfereiU measuring proc- 
esses, providing the measuring devices do not have 
the same type of syslemalic errors. Another pro- 
cctltire is to assign replications to interviewers to 
(ictermine the variabilily in survey data that is 
atii ibuiable to the interview'er.s when it is not a 
systema tic error. Hie idea is to make part of the 
survey a controlled cxpcrijneiU with precautions, 
such as random ixation, that arc typical of good 
experimentation. 

R e f u sj il s a rc rospo n si b I c f o r pa r t of the n on- 
sampling errors due to nonrcspaiise. Procedures 
are developed and tested not only to reduce ibe 
number of refusals, but also to provide estimates 
of tliose that rejiiuiti refusals. 

Remote Sensing 

■"Remote sensing'’ means measuring an object 
or phenoUrenon from a distance, whether by pho- 
lography or other radionictric tcchiiique using 
microwave instriunents, spectroiaidipnieiers, muUi“ 
spectral scanners, etc. Th measurements arc of 
electromagnetic energy which is emitted, Scal- 
tered, or rellected by the objects observed. 
ent objects rcuiiai dillcrcnl kinds and amounts 
of ei VC rgy/lleiiiotc sensing utili/cs these detectable 
lii II ercnecs to id eiui fy ; g rcni nd objcels or phe- 
nanrena from the air or from space. 

Crop idont i (rcalion and acreage measurement 
have been recogni^^cd as patent ia| applications of 
rcmole sensing.: An [dca} approacir niight be to 
make acreage estiniatcs froni sensor information 
every 24 hoiirs, but the data-handling problem 
and the lack of an all-vveathcr sensor systenv 
inakes this impossible exccivt in special siluations. 
Cunsecjucnlly, other ways liavc to he found to use 
remole-sensing data: 

Several possible approaches arc; ( 1 ) double 
sampling or multistage sampling, (2) multiplc- 
i raiiie .sanipliiigv or { 3 ) using sptiee imagery as an 
area frame on wliicii broad land u.se clas.sineations 
have been done. 'i‘his land use classirication would 
(hen be used in designing a siraiificd sample. Or 
space imagery could be used as a frame from 
wiiich one could select a siib.snmplc of aircraft 
llighl strips and, within llight strips, select area 


CMAPll R 2. SAMPLING MirrilOOOLOGY AND IZSTIMATION 


scginonls. These area segnicius coiiicl then be 
pholographcd at a larger scale or cinimcraicd on 
ihc grouiul. *rhis is a miiliisUigc sample using sev- 
eral tlilTercnl kinds (diniarniation 

Likewise^ space imagery of a coiinly or State 
cr)Uld be classified according lo the crops of inter- 
est. lo-om tliis ciassilicalion one would select a 
sample (or use an existing sample) of area seg- 
ments and collect the necessary infornmiion about 
these areas on the ground, T\)\^ is a double-sam- 
pling technique in wlndi the space inforination is 
the large sample, and ground survey provides the 
more detailed inrormaiion:. If the correlation be- 
tween the ground iirformtition and the space data 
is high, substantial gains can be realised in mak- 
ing cro|vestimates for the total area, : 

Space imagery may •also provide more cllicicnt 
estimates by providing supplcineiitary data. For 
example, it may be possible to classify crops by 
frame units in the present tirca framcn 'rhis would 
ineatt that if one Nvere interested in corn, he could 
selcGt the sample from frame units with proba- 
bility proportional to the acreage classifictl as corn, 
If the coitclat ion between the classified corn acre- 
age and the actual acreage was high, gains *in csti- 
n^ation usiiTg ratio and regression techniques could 
he realized, 

Until an /all-weather satellite is developed, an 
estimating icchnic|ue must be developed that can 
l>e used where satcliilc coverage is incomplete. 
One solution is to use nrulli ple-f mine cs t i mating 
techniques, such as using the space imagery 
cstinv;vtc the Gloud-frce area, aiTci the aerial plioto- 
graplis and grouml eniiincraiion estimates lor the 
area covered by clouds on the space itnagery. 
/l |ien,/by proper weightings ah three tlala sources 
arc combined to obtain aiv cstiniaie Tor the total 
area. 

i\ 0 inotc sensing lias some polcntiul in livestock 
estimation, particularlv in hard-to-gel-to areas or 
in aroii.s of iK)ni\\sponse. At prescnl, this approacl^ 
is/ limiied to aerial pholography with suflicient 
lesolulion and to areas where livestock occupy 
(ipcn areas, or areas with liiiiited vegeiatio 

Yield Forecasting and Eslimation 

Research directed toward the development of 
objective methods of estimating and forecasting 
yields is condiielcd for a wide variety of crops. 


The eslimation of crop yields at harvest and fore- 
casting of yields yet to be realized arc two distinct 
|diasc.s of the research cfTort. ['’or inost crops, the 
development of methods of esuniating harvesting 
losses coiTSiiuites an addilional phase. 

Crop yield estimation is based on the observa- 
tion of plant and fruit cluiractcrisucs just prior 
to harvest, at harvest, or soon after harvest is , 
Gompletcdv Research in estimating biological yield, 
harvested yield, and harvest, losses involves de- 
veloping ■methods which rely .statistical sam- 
pling and estimation theory. For purposes of cf- 
iiclent sampling aiifl cstimalion, it is often useful 
lo treat yield as the product of components such 
as weight or sme per fniiu fruit per piantr and 
plants per acre. 

Forecasting of yields involves predicting what 
has not yci happeiicd. Methods of forecasting the 
final yield while a crop is still imniaiure arc obvi- 
ously more dinicuU to develop than estimation 
procedures at ha iwcsL Crop yields arc the 01111*0]- 
nation of many faetcu-S; These factors arc gem 
crally associated \viih the plant, its location, 
weather, and production practices. The timing 
and Interaction of weather factors and the ex- 
trenicly convjdex intcimclions of all iinportant fac^ 
tors make tlu’ir direct use in predicting final yields 
cxtrenicly/ difliculp Fortvinalcly, observations of 
the immature crop Can be made which are often 
usefu! in prccllcting tlic resulting yield. Crops in 
an imniaiure singe of dcvelopincnl are a reflection 
of tlie collective and interacting chocts of these 
factors over a portion of the growing season. In- 
asmuch ns tiicsestimc factors also constitute a 
primary innuence on the mature crop, observa- 
tions made at an immutufe stage provide a good 
basis for yield forecasts. 

’l o dcvciop .succcssrui nielhods of forecasting 
yields, it is necessary to discover spcciric plant 
characterislic.s which arc useful predictors of yield. 
A comprehensive understanding of the fruiting 
behavior of a crop is the essontial first step in the 
devchipiricnt of the predictive models./ J'orecast 
inodels designed to relate these clutracterislics to 
yield or its components may be based upon know)- 
edge, verHied by experimenlai studies, about 
pliiiU growth and developmciu during the season 
and time-rehUed growth patterns. This knowledge 
may be acquired primarily tlu-ouglw agricultural 


CIIAPTliR 2. SAMPIJNO NUm IODOI.()Ci Y AND l SIIMAnON 


research. Special invosligativc siirvcys arc Jiuulc 
to nil gaps ill previous rcscaicli and to atlapi the 
motjcls (o current praciices. In acldilion to nioclcls 
wliich rely on the repeatability of plant growth, 
and paitcrns acljusiod for ciirreiit fruit develop- 
ment, regression models based on Mic stability of 
ptriaimciers between years arc in tise. Tlicsc 
models often incorporate the dcvciopmentai singe 
of the plant and iisefruil in order to iilili/c unique 
model parameters for iiulividual mtiiuriiy cvUc- 
gorics by Stmes or agricultural regions. 

Forecaslnig crop yields also requires enicicnt 


csliniatioir of variables used in the models which 
have been developed. Sampling and cslimalion 
theory is ulili/cd to achieve this cfllcicncy. Since 
sampling considerations arc "usiially sumclcntly 
compatible for the predictive variables and csIh 
mates of liiial yield; the same sampling design 
can be used Tor obtaining both inunature and nia- 
Itirc plant aud fruit obsciwalions. Thus rclalrons 
belween obsorvalions at various stages of maturity 
may bo studied in great detgil at the common 
elementary unit level or at other levels in a lucr- 
archical sampling design. Cl 



APPENDIX C 


THE REMOTE SENSING OF BARE FIELDS FOR CROP ACREAGE 

ESTIMATION 

If autoirtaLic processing of LANDSAT cligital data for 
full-scale crop surveys is to become a reality, the solution of 
crop classification problems by use of spectral signatures of 
growing crops is required. In particular, Gonsiderable effort 
must be expended on the technical problems of: (i) signature 

extension (ii) supervised and unsupervised classifiGatipn ^ ^ 

"learning" algorithms (iii) spectral signature analog areas/ 
etc., .all applied to the crops in various stages of their 
growth cycle. 

On the other hand, principal investigator Stanley A. 
Mora in has done a successful Kansas 10-county winter wheat 
study relying on the correct classification of freshly plowed ; 
"wheat" fields ~ implying the intention to plant wheat “ with 
subsequent adjustments due to the grov/th and harvestability of 
the actual wheat. * His method required visual interpretation 
of the imagery, and thus may not be found suitable for adapta- 
tion to automatic processing. Concerning the difference in 
approach between Morain's study and others , note the follov/ing 
points ; 

*Kansas Envi ron men tal And Resource Study: A Great Plains Model ; 
Extraction of Agricultural Statistics from BR TS -1 Data of Kansa s, 
S.A. Morain, Type HI Final Report under contract NAS 5-21822, 
Task 4, February 1974. 



(1) It is relatively "easy" to discriminate freshly 
plowed fields from the same fields covered with stubble from 
the last harvest, or fallow, i.e., containing some plant cover 
or containing worthless crops left to rot or used for forage. 

(2) Crop calendars, throughout the' world, are well 
known and documented (in the statistical sense).* This does not 
give one certainty as to what will be planted at a particular 
point in time in a specified field; but it provides a high 
probability that a known crop v^ill be there, or in the case of 
crop rotation, that one out of two or three crops will be there. 

(3) The intelligent use of crop calendars, as by 
Morain, should provide an excellent database together with 
LANDSAT data from which to construct the initia j^ acreage 
estimates. These must be corrected later for losse s (very oc- 
casionally also gains) due to hail, flooding, late frost, in- 
sect infestation, blight and farmer's decisions not to harvest. 
These points are discussed in more detail in the notes at the 
end of this appendix. 

Thus, the initial LANDSAT acreage estimates based on 
plowed fields correspond to USDA/SRS "planting intentions, 
but of course are much more nearly objectiv e_. Furthermore, they 
can be done on a near census-type approach, rather than using 
a tiny probability sample with relatively large sampling errors. 

•^Agricultural Atlas 



The crop calendars, v;hich should be very detailed, contain the 
essential information for classifying with LANDSAT a large 
fraction of agricultural acreage in the U.S. (also in other 
countries with similar agricultural practices) at the time of 
planting* - or shortly before. This provides a good estimate 
of planting intentions acreage. The fields should be catalogued, 
for latex: information retrieval, so that the grov^th of a 
healthy crop can be verified, or in cases of severe crop stress 
the acreage can be accordingly reduced. Eventually , the LAISDSAT 
system may also be capable of detecting crop condition suf'- 
ficiently accvirately to allow for the measurement of yield by ^ 
us ing intertemporal data on already classified fields, thus 
supplying a complete remote sensing system for obtaining crop 
production estimates . In the meanwhile , it is important that 
the acreage estimation be done as well as possible v;ith LANDSAT. 

Investigators v?ho are working v/ith remote sensing of 
grov/ing crops as compared to bare fields appear to be attempting 
to solve a much more difficult task, i.e., of resolving the in- 
tricate spatial and temporal differences in spectral signatures 
of growing crops . 

Granted there is a need for this effort in attempting to develop 
a complete crop production measurement system with LANDSAT. 

* ; ' ' Which varies both by crops and by country , and with- 

in country, by latitude and geography as well. 



But pioneers of the crop survey applications effort might have 
a better chance of succeeding in the near future if they would 
start v/ith the acreage estimation, using the available informa- 
txon as to what most likely will be planted in the freshly 
plowed fields from knowledge of the local planting times. 

Writing in the Type III Final Report (1974) of "Kansas 
Environmental and Resource Study: A Great Plains Model," Morain 
stated: 

"The results presented here demonstrate that a 
simple method for winter wheat identification may be de- 
veloped given an adequate prior knowledge of local environ- 
ment and crop cycle. The method appears to be applicable 
to other crops if suitable distinct crop cycle events may 
be defined. Knowledge of the local environment is critical 
if the interpretation is to be successfully conducted. 
Components of the local environment data set can be taken 
directly from the ERTS-1 imagery (Williams and Coiner, 1973) 
but other components are best developed at the local level. 
Furthermore, surface observations for a small number of 
fields from each environmental area would be a necessity. 

The necessity for (1) surface observation, (2) knowledge of 
the local enviroiiment , (3) knowledge of the local crop cycles, 

and (4) the modest amount of equipment and training required 
to perform these interpretations make this method suitable 
for implementation at the local (county) level." 


NOTES 

1. Acreage nearly always decreases from planting 
time onwards through the grovning season due to the simple fact 
that crops suffering various kinds of stress may be (i) plowed 
under (ii) left to rot in the field (iii) destroyed completely 
by hail or floods. Nevertheless, occassionally there are in- 
creases due to replanting with another crop, e.g., a corn crop 



is planted on May 1, damaged by flooding on May 15 and sub- 
sequently plowed under in late May. The same field is then 
replanted with soybeans on June 1 resulting in a net loss of 
corn acreage, but a net gain of soybean acreage. 

2. Plov;ing a field under may be done at various 
points in the agricultural cycle: 

@ post-harvest and pre-planting, usually in Spring/ 

© pre-harvest by farmer's decision relating to 
expected profits. 

In the latter case, various reasons for plow-under exist: 

© to give nutrients to the soil/ 

@ to allow for a second crop to be planted. 

Whenever plow-under occurs in preparation for planting a field, 
it is a normal part of agricultural practice, 

3. 'The economic decision not to harvest crops already 
planted happens very infrequently in poor countries. In rich coun- 
tries it may be done for reasons of crop stress, poor yield or low 
price expectations and is usually accompanied by plow -under. 
Hov;ever, in the case of 1974 flooding in the Mississippi Valley, 
the crops were left to rot in the fields. If there is a decision 
to replant a field, it will usually allow a small time window 

for LANDS AT to observe the change in the field - perhaps a 
week at most. 

4. In tropical countries, the ability to grow more 
than one crop per year makes crop classification by remote 



sensing more difficult. This remark applies particularly to 
India. Nevertheless, there may be a chance to observe the 
plowing between crops. 



APPENDIX D 


SAMPLING PROBLEMS IN REMOTE SENSING CROP SURVEY APPLICATIONS 


The LACIE Program's Sample Design 

Suppc-se that the sample consist of M area segments 
(e.g., 5x6 mile rectangular areas, as in, LACIE) selected 
according to a stratified sampling plan for the crop in question. 
Each segment contains N pixels of which a fraction, f^, are 
cloud covered. Because the supervised classification procedure 
uses previously designated training fields with each segment/ 
it is necessary to obtain a certain minimum level of cloud- 
free pixels. Segments which have too large a value of fj^ wi^^^ 
be rejected (approximately f^ > .20). Wheat acreage is esti~ 
mated -by a weighted sum of the "v^heat" pixels in the segments 
which pass the cloud cover test. 



M 

a E 6 . w . E 

. . IX. , 

1=1 3-1 



/(E A. ./N) 
j=i 


where : a = area of 1 pixel 

(5 = 1 if i^ segment is rejected because of too much cloud 

■ - cover'' ' 
th 

(S\j^ = 0 if X segment IS accepted. 

1 if pixel (i, j) is cloud free, 

0 otherwise 

th 

w^= sampling weight for i segment as determined by samp- 
ling plan 


from 



Rev/riting 


where 


area fraction assigned to wheat in pixel (i,j) 
classification (1 or 0 perhaps?) 


m 


n 


X 


W a £ W. . s ) 

k=l lj^-1 k -^Ij^ 




in £ M, rij^ _< N and represents those segments 


for which f . < f 

o 


. n._ 

and {j,} represents those pixels 

X i th 

in the ( k) segment 

which are cloud-free* 

There are two problems with this. (1) The subset of the 
weights in the acceptable (cloud-free) segments does not pro- 
vide the correct normalization, and (2) The number of segments 
(m) and the number of cloud-free pixels in each segment 
are random var^r^ 

Problem (1) could be attached by artificia forcing 
weights to reflect the actual segment selection process How- 
ever f this throws the bur den onto Pr oblem (2 ) , a s the correct— 

■ m ■ ■ ■ ■■ ; 

ed weights will now depend on m and 

With regard to LACIE, the problem is further com- 
plicated by the Group III procedures: for counties which are 

not represented in the reduced ( acceptab lye loud- free ) sample 
of segments at all , ratio estimates are concocted using last 



year's census figures. While this may be a desirable step for 
regional or district reporting purposes, it adds no useful in- 
formation to th® national acreage estimate and creates further 
problems for statistical analysis of the properties of the 
estimator. The national acreage estimate should be handled in 
a way that uses the current acreage information from xemotely 
sensed data after cloud cover screening optimally . , This pur- 

pose is not served by adding in agricultural survey or census 
data in an ad hoc manner to compensate for missing segments . 

Cloud Cover and its Effect on Crop Acreage Estimates 

Cloud cover has tv/o effects on the statistical pro- 
perties of remote sensing estimates of crop acreage; (1) it 

reduces the available sample at any one time thus causing an. 
increase in variance of the estimate; (2) it may introduce bias 
into the estimate if cloudiness is correlated with presence or 
absence of the crop and no adjustment in sample design or esti- 
mation procedure is made. At the segment level it is unlikely 
that the cloud cover distribution is anything but random, and 
LAC IE procedures notably imply this assumption. However, at 
the district or regional level, it is quite likely that cloud 
cover distributions will exhibit marked patterns of spatial 
correlation which leads to the possibility of bias errors in 
estimating crop acreage. For example , if segments in the state 
of N. Dakota are frequently covered with clouds or free of 
clouds s imultaneously (if you lose one, you lose them all). 



then any peculiarities of wheat culture in that part of the 
country will be misrepresented in the sample. Although, perhaps, 
nothing can be done about the missing remote-sensed data per 
se,* the estimation procedure should be compensated for the 
effect. One way to do this would be to design the sample - 
select the segments - with cloud cover as well as wheat culture 
in mind. 


Probability 
of Cloud 
Cover > C 


■ C - Cloud Cover ; ; 

■ 'Area Praction-'':''''!':'': 

Figure D., 1 Cloud Cover Statistics by Weather Region 
by Month 

Suggested Methodolog y; Obtain cloud cover statistics by weather 
region by month (see Fig, 1) . Determine theoretically a thres- 
hold fraction (cloud cover area) f for acceptance of a segment. 

c 

* Consideration could be given to the use of aircraft 

to fill in gaps in the sample. 




This fraction should be as large as possible consistent with the 
classification procedures. Then stratify the sampling design 
according to cloud cover in the same way that it would be stra- 
tified for wheat growing. These stratifications can be 
either in series or in parallel. For instance, the sampling may 
be done in two stages. First the total list of all segments in 
the country must foe stratified according to the wheat-growing 
practices. Then select segments from the wheat-growing 
strata with probability proportional to size (amount of wheat 

growing in the stratum historically) . If Uj is the sample size 

. s , 

in the j stratum, then N, = E n where sxs the number of 

■*- . j-1 ^ ■ 

wheat-grov;ing strata. These segments must then be stratified 
again by cloud cover probabilities , i . e . , the new strata are 
homogeneous weather regions . From each cloud cover stratum, 
select some number of segments (p . p . s . ) * and form a subsample 
of size N 2 < N^. Obviously to obtain ~ 

necessary to use considerably larger first-stage sample size, 

N^. If mj^ is the sample size in the k cloud stratum then 

■ t. ■■ ■ ^ 

= E m, , where t is the number of cloud strata, 

'^'.k=l ^ . . . , ; 

The approach outlined above is frequently employed in 
large surveys. It has the advantage of reducing bias in esti- 
mation of the key attributes, while maintaining sampling 
efficiency. 

* ~ Probability proportional to size. 



Effect of Spatial Correlations Between Neighboring 
Segments on Sample Design 


The estimates of wheat acreage are not directly af- 
fected by spatial correlations between segments (as already men- 
tioned they may be indirectly biased through cloud cover effects) ; 
but the confidence intervals are affected as the following 


analysis shows 


Let = observed no. of "wheat" pixels in i 

segment; and = true no. of "wheat" pixels in 

. th . . : 

1 segment . 

M 

Let X = Z VI. X. be the wheat acreage estimate v/here W. are 

■ i=l 1 ■■ ' , , 

sampling weights . If the are independent binomial random 


variables , 


M 2 


M .,2 


„2 


then var (X) = E W. var (X.) = E W. o. = E W, N (-j-) 

. -| X X i=i X X N 


Mi, 1 


E Wf M, (N - M.) 


N N lx X 

Now suppose that the X^ are dependent , in a specific pattern 
indicated by the subscript differences as follov;s: 


E [ (X. - M. ) (X . - M. ) ] = 
X X 3 3 


0 . 0 . P . . . 

X 3 xf X - 3 


0 if i - j > 1 

M 2 2^ 

Then var (X) = E W, + E o. , o. p W. , W. 

i=l 1 3 = 2 i i 3 

2 

where is the same as before. 


D-6 




Figure D . 2 Numbered Segments in a Wheat "Belt" 

The last term is non-negative if p > 0 , * so that the pattern of 
spatial correlations has caused that much increase in variance 

of the wheat acreage estimator. 

Had this particular spatial correlation pattern been 
known in advance, even if the size of p were unknown, one could 
have placed a constraint on the sampling plan*: do no select 

S. or S. if S. is selected. For the same size sample this 
constraint vrould have increased efficiency (narrower conridence 
limits) because it V 70 uld have eliminated the -term with P in it by 
causing W^-l W^ = 0 for all j = 1, ...M. The conclusion is that 
a study of the spatial correlations would generally improve the 
sampling efficiency. 

* ' ~ The occurrence of p < 0 for spatial phenomena of this 

type is not plausible: it would involve the implication 

that wheat acreage is lower in the "neighboring’^ segment 
if it is higher in this segment. However for widely se 
parated segments this could occur for economic reasons. 
The full treatment of this subject would require that 
advantage be taken of the entire correlatioii matrix, 
if known. 


D-7 




